Multi-stage production pipeline system

ABSTRACT

Multi-stage production pipeline system that may be utilized in conjunction with a motion picture project management system. The multi-stage production pipeline system includes a computer and a database. The database includes metadata associated with at least one shot or associated with regions within the plurality of images in the at least one shot, or both. The computer includes a grouping tool interface for presenting user interface elements and accepting input of the metadata associated with the at least one shot or regions within the plurality of images in the at least one shot, or both. The system enables a large studio workforce to work non-linearly on a film while maintaining a unified vision driven by key creative figures, allowing for more consistent, higher quality, faster, less expensive work product and more efficient project management techniques. The system also enables reuse of project files, masks and other production elements across projects.

This application is a continuation of U.S. Utility patent applicationSer. No. 13/895,979, filed 16 May 2013, which is a continuation in partof U.S. Utility patent application Ser. No. 13/366,899, filed 6 Feb.2012, the specifications of which are hereby incorporated herein byreference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

One or more embodiments of the invention are related to the field ofmotion picture production and project management in the motion pictureindustry and relates to reviewers, production managers who manageartists. Production managers are also known as “production” for short.Artists utilize image analysis and image enhancement and computergraphics processing for example to convert two-dimensional images intothree-dimensional images associated with a motion picture or otherwisecreate or alter motion pictures. More particularly, but not by way oflimitation, one or more embodiments of the invention enable amulti-stage production pipeline system that may be utilized inconjunction with a motion picture project management system thatincludes metadata associated with at least one shot or associated withregions within the plurality of images in the at least one shot, orboth. The metadata is associated with the at least one shot or regionswithin the plurality of images in the at least one shot, or both. Thesystem enables a large studio workforce to work non-linearly on a filmwhile maintaining a unified vision driven by key creative figures,allowing for more consistent, higher quality, faster, less expensivework product and more efficient project management techniques. Thesystem also enables reuse of project files, masks and other productionelements across projects and efficiently manage projects related tomotion pictures to manage assets, control costs, predict budgets andprofit margins, reduce archival storage and otherwise provide displaystailored to specific roles to increase worker efficiency.

2. Description of the Related Art

Known methods for the colorizing of black and white feature filmsinvolves the identification of gray scale regions within a picturefollowed by the application of a pre-selected color transform or lookuptables for the gray scale within each region defined by a maskingoperation covering the extent of each selected region and the subsequentapplication of said masked regions from one frame to many subsequentframes. The primary difference between U.S. Pat. No. 4,984,072, SystemAnd Method For Color Image Enhancement, and U.S. Pat. No. 3,705,762,Method For Converting Black-And-White Films To Color Films, is themanner by which the regions of interest (ROIs) are isolated and masked,how that information is transferred to subsequent frames and how thatmask information is modified to conform with changes in the underlyingimage data. In the U.S. Pat. No. 4,984,072 system, the region is maskedby an operator via a one-bit painted overlay and operator manipulatedusing a digital paintbrush method frame by frame to match the movement.In the U.S. Pat. No. 3,705,762 process, each region is outlined orrotoscoped by an operator using vector polygons, which are then adjustedframe by frame by the operator, to create animated masked ROIs. Variousmasking technologies are generally also utilized in the conversion of 2Dmovies to 3D movies.

In both systems described above, the color transform lookup tables andregions selected are applied and modified manually to each frame insuccession to compensate for changes in the image data that the operatordetects visually. All changes and movement of the underlying luminancegray scale is subjectively detected by the operator and the masks aresequentially corrected manually by the use of an interface device suchas a mouse for moving or adjusting mask shapes to compensate for thedetected movement. In all cases the underlying gray scale is a passiverecipient of the mask containing pre-selected color transforms with allmodifications of the mask under operator detection and modification. Inthese prior inventions the mask information does not contain anyinformation specific to the underlying luminance gray scale andtherefore no automatic position and shape correction of the mask tocorrespond with image feature displacement and distortion from one frameto another is possible.

Existing systems that are utilized to convert two-dimensional images tothree-dimensional images may also require the creation of wire framemodels for objects in images that define the 3D shape of the maskedobjects. The creation of wire frame models is a large undertaking interms of labor. These systems also do not utilize the underlyingluminance gray scale of objects in the images to automatically positionand correct the shape of the masks of the objects to correspond withimage feature displacement and distortion from one frame to another.Hence, great amounts of labor are required to manually shape and reshapemasks for applying depth or Z-dimension data to the objects. Motionobjects that move from frame to frame thus require a great deal of humanintervention. In addition, there are no known solutions for enhancingtwo-dimensional images into three-dimensional images that utilizecomposite backgrounds of multiple images in a frame for spreading depthinformation to background and masked objects. This includes data frombackground objects whether or not pre-existing or generated for anoccluded area where missing data exists, i.e., where motion objectsnever uncover the background. In other words, known systems gap fillusing algorithms for inserting image data where none exists, whichcauses artifacts.

Current methods for converting movies from 2D to 3D that includecomputer-generated elements or effects, generally utilize only the finalsequence of 2D images that make up the movie. This is the current methodused for conversion of all movies from two-dimensional data to left andright image pairs for three-dimensional viewing. There are no knowncurrent methods that obtain and make use of metadata associated with thecomputer-generated elements for a movie to be converted. This is thecase since studios that own the older 2D movies may not have retainedintermediate data for a movie, i.e., the metadata associated withcomputer generated elements, since the amount of data in the past was solarge that the studios would only retain the final movie data withrendered computer graphics elements and discard the metadata. For movieshaving associated metadata that has been retained, (i.e., intermediatedata associated with the computer-generated elements such as mask, oralpha and/or depth information), use of this metadata would greatlyspeed the depth conversion process.

In addition, typical methods for converting movies from 2D to 3D in anindustrial setting capable of handling the conversion of hundreds ofthousands of frames of a movie with large amounts of labor or computingpower, make use of an iterative workflow in a linear manner that doesnot take into account common elements that exist for example ifnon-adjacent scenes. Since work is generated on a scene basis, theresulting work product is typically non-consistent, for example withcolors and/or depths for a given object that appears in differentscenes. The iterative workflow includes masking objects in each frame,adding depth and then rendering the frame into left and right viewpointsforming an anaglyph image or a left and right image pair. If there areerrors in the edges of the masked objects for example, then the typicalworkflow involves an “iteration”, i.e., sending the frames back to theworkgroup responsible for masking the objects, (which can be in acountry with cheap unskilled labor half way around the world), afterwhich the masks are sent to the workgroup responsible for rendering theimages, (again potentially in another country), after which the renderedimage pair is sent back to the quality assurance group. It is notuncommon in this workflow environment for many iterations of acomplicated frame to take place. This is known as “throw it over thefence” workflow since different workgroups work independently tominimize their current work load and not as a team with overallefficiency in mind. With hundreds of thousands of frames in a movie, theamount of time that it takes to iterate back through frames containingartifacts can become high, causing delays in the overall project. Evenif the re-rendering process takes place locally, the amount of time tore-render or ray-trace all of the images of a scene can causesignificant processing and hence delays on the order of at least hours.Elimination of iterations such as this would provide a huge savings inwall-time, or end-to-end time that a conversion project takes, therebyincreasing profits and minimizing the workforce needed to implement theworkflow.

General simplistic project management concepts are known, however theformal and systematic application of project management in engineeringprojects of large complexity began in the mid-1900's. Project managementin general involves at least planning and managing resources and workersto complete a temporary activity known as a project. Projects aregenerally time oriented and also constrained by scope and budget.Project management was first described in a systematic manner byFrederick Winslow Taylor and his students Henry Gantt and Henri Fayol.Work breakdown structure and Gantt charts were used initially andCritical Path Method “CPM” and Program Evaluation and Review Technique“PERT” were later developed in industrial and defense settingsrespectively. Project cost estimating followed these developments. Basicproject management generally includes initiation, project planning,execution, monitor/control and completion. More complex projectmanagement techniques may attempt to achieve other goals, such asensuring that the management process is defined, quantitatively managedand optimized for example as is described in the Capability MaturityModel Integration approach.

As described above, industrial based motion picture projects typicallyinclude hundreds of thousands of frames, however in addition, thesetypes of projects may also utilize use gigantic amounts of storageincluding potentially hundreds of layers of masks and images per frameand hundreds of workers. These types of projects have been managed in afairly ad hoc manner to date in which costs are difficult to predict,controlled feedback to redirect a project toward financial success,asset management and most other best practice project managementtechniques are minimally utilized. In addition, project management toolsutilized include off the shelf project management tools that are nottailored for the specifics of project management in a unique verticalindustry such as motion picture effects and conversion projects. Hence,predicting costs and quality and repeatedly performing projects in thefilm industry has been difficult to accomplish to date. For example,existing motion picture projects sometimes require three people toreview an edited frame in some cases, e.g., a person to locate theresource amongst a large number of resources, a person to review theresource and another person to provide annotations for feedback andrework. Although standalone tools exist to perform these tasks, they aregenerally not integrated and are difficult for personnel in differentroles to utilize. In addition, since the work is generally performed ina non-linear manner, there is no re-use of masks or other items inscenes that are not in linear time order and hence, the project costsare higher and resulting work product is inconsistent and of lowerquality that would be achieved if the common elements in differentscenes could be worked on and managed using metadata associated with thescenes or regions within the scenes.

Regardless of the known techniques, there are no known non-linearworkflows for 2D to 3D conversion or special effects projects thatenables reuse of masks and other items common to objects that appear innon-linear time sequence. In addition, there are no known optimizationsor implementations of project management solutions that take intoaccount the unique requirements of the motion picture industry. Hencethere is a need for a motion picture project management system.

BRIEF SUMMARY OF THE INVENTION

Embodiments of the invention generally are directed at a multi-stageproduction pipeline system that may be utilized in conjunction with amotion picture project management system. One or more embodiments of theinvention enables a computer and database to be configured to acceptmetadata associated with at least one shot or associated with regionswithin the plurality of images in the at least one shot, or both. Themetadata is associated with the at least one shot or regions within theplurality of images in the at least one shot, or both. The systemenables a large studio workforce to work non-linearly on a film whilemaintaining a unified vision driven by key creative figures, allowingfor more consistent, higher quality, faster, less expensive work productand more efficient project management techniques. The system alsoenables reuse of project files, masks and other production elementsacross projects and greatly improves project management related to theproduction, processing or conversion of motion pictures. Large motionpicture projects generally utilize workers of several roles to processeach image that makes up a motion picture, which may number in thehundreds of thousands of image frames. One or more embodiments of theinvention further enables a computer and database to be configured toaccept the assignment of tasks related to artists, time entries fortasks by artists and review of time and actuals of artists bycoordinators, a.k.a., “production” and the review of work product byeditorial roles. The system thus enables artists working on shots madeup of multiple images to be managed for successful on-budget completionof projects, along with minimization of generally vast storagerequirements for motion picture assets and enable prediction of costsfor future bidding on projects given quality, ratings of workers to useand schedule.

Tasks involved in a motion picture project generally include tasksrelated to assessment of a project, ingress of a project, assignment oftasks, performing the assigned tasks or project work, reviewing the workproduct and archiving and shipping the work product of the project. Oneor more embodiments of the invention enable workers of different “roles”to view project tasks in a manner consistent with and which aids theirrole. This is unknown in the motion picture industry. Roles may beutilized in one or more embodiments of the invention that include“editorial”, “asset manager”, “visual effects supervisor”, “coordinator”or “production”, “artist”, “lead artist”, “stereographer”, “compositor”,“reviewer”, “production assistant”. In a simpler sense for ease ofillustration, three general categories relate to production workers thatmanage artists, artist workers who perform the vast majority of workproduct related work and editorial workers who review and providefeedback based on the work product. Each of these roles may utilize aunique or shared view of the motion picture image frames and/orinformation related to each image or other asset that their role isassigned to work on.

General Workflow for the Assessment Phase

Generally, the editorial and/or asset manager and/or visual effectssupervisor roles utilize a tool that shows the motion picture on adisplay of a computer. The tool, for example enables the various rolesinvolved in this phase to break a motion picture down into scenes orshots to be worked on. One such tool includes “FRAME CYCLER®”commercially available from ADOBE®.

General Workflow for the Ingest Phase

Generally, the asset manager enters the various scene breaks and otherresource such as alpha masks, computer generated element layers or anyother resources associated with scenes in the motion picture into adatabase. Any type of database may be utilized in one or moreembodiments of the invention. One such tool that may be utilized tostore information related to the motion picture and the assets forproject management includes the project management database “TACTIC™”,which is commercially available from SOUTHPAW TECHNOLOGY Inc.™. Anydatabase may be utilized in one or more embodiments of the invention solong as the motion picture specific features are included in the projectmanagement database. One or more embodiments of the invention update the“snapshot” and “file” tables in the project management database. Theschema of the project management database is briefly described in thissection and described in more detail in the detailed description sectionbelow.

General Workflow for the Assignment Phase

Generally, production workers utilize an interface that couples withproject management database to assign particular workers to particulartasks associated with their role and assign the workers imagesassociated with shots or scenes in a given motion picture. One or moreembodiments of the invention make use of basic project managementdatabase digital asset management tables and add additional fields thatimprove upon basic project management functionality to optimize theproject management process for the motion picture industry. One or moreembodiments of the invention update the “task” table in the projectmanagement database.

General Workflow for the Project Work Phase

Generally, artists, stereographers and compositors perform a largeportion of the total work on a motion picture. These roles generallyutilize a time clock tool to obtain their tasks and set task status andstart and stop times for the task. Generally, artists perform mask andregion design and initial depth augmentation of a frame. The artistsgenerally utilize a ray tracing program that may include automated masktracking capabilities for example, along with NUKE™ commerciallyavailable from “THE FOUNDRY™, for mask cleanup for example. Once aclient approves the visual effects and/or depth work on a scene, thencompositors finish the scene with the same tools that the artists useand generally with other tools such as AFTER EFFECTS® and PHOTOSHOP®,commercially available from ADOBE®. In one or more embodiments of theinvention, the person who worked on a particular asset is stored in theproject management database in custom fields for example.

In specific workflow scenarios, workers in region design for exampleclassify elements in scenes into two separate categories. Scenesgenerally include two or more images in time sequence for example. Thetwo categories include background elements (i.e. sets and foregroundelements that are stationary) or motion elements (e.g., actors,automobiles, etc.) that move throughout the scene. These backgroundelements and motion elements are treated separately in embodiments ofthe invention similar to the manner in which traditional animation isproduced. In addition, many movies now include computer-generatedelements (also known as computer graphics or CG, or also ascomputer-generated imagery or CGI) that include objects that do notexist in reality, such as robots or spaceships for example, or which areadded as effects to movies, for example dust, fog, clouds, etc.Computer-generated elements may include background elements, or motionelements.

Motion Elements: The motion elements are displayed as a series ofsequential tiled frame sets or thumbnail images complete with backgroundelements. The motion elements are masked in a key frame using amultitude of operator interface tools common to paint systems as well asunique tools such as relative bimodal thresholding in which masks areapplied selectively to contiguous light or dark areas bifurcated by acursor brush. After the key frame is fully designed and masked, the maskinformation from the key frame is then applied to all frames in thedisplay-using mask fitting techniques that include:

1. Automatic mask fitting using Fast Fourier Transform and GradientDecent Calculations based on luminance and pattern matching whichreferences the same masked area of the key frame followed by all priorsubsequent frames in succession. Since the computer system implementingembodiments of the invention can reshape at least the outlines of masksfrom frame to frame, large amounts of labor can be saved from thisprocess that traditionally has been done by hand. In 2D to 3D conversionprojects, sub-masks can be adjusted manually within a region of interestwhen a human recognizable object rotates for example, and this processcan be “tweened” such that the computer system automatically adjustssub-masks from frame to frame between key frames to save additionallabor.

2. Bezier curve animation with edge detection as an automatic animationguide

3. Polygon animation with edge detection as an automatic animation guide

In one or more embodiments of the invention, computer-generated elementsare imported using RGBAZ files that include an optional alpha maskand/or depths on a pixel-by-pixel, or sub-pixel-by-sub-pixel basis for acomputer-generated element. Examples of this type of file include theEXR file format. Any other file format capable of importing depth and/oralpha information is in keeping with the spirit of the invention.Embodiments of the invention import any type of file associated with acomputer-generated element to provide instant depth values for a portionof an image associated with a computer-generated element. In thismanner, no mask fitting or reshaping is required for any of thecomputer-generated elements from frame to frame since the alpha anddepth on a pixel-by-pixel or sub-pixel-by-sub-pixel basis alreadyexists, or is otherwise imported or obtained for the computer-generatedelement. For complicated movies with large amounts of computer-generatedelements, the import and use of alpha and depth for computer-generatedelements makes the conversion of a two-dimensional image to a pair ofimages for right and left eye viewing economically viable. One or moreembodiments of the invention allow for the background elements andmotion elements to have depths associated with them or otherwise set oradjusted, so that all objects other than computer-generated objects areartistically depth adjusted. In addition, embodiments of the inventionallow for the translation, scaling or normalization of the depths forexample imported from an RGBAZ file that are associated withcomputer-generated objects so as to maintain the relative integrity ofdepth for all of the elements in a frame or sequence of frames. Inaddition, any other metadata such as character mattes or alphas or othermasks that exist for elements of the images that make up a movie canalso be imported and utilized to improve the operated-defined masks usedfor conversion. On format of a file that may be imported to obtainmetadata for photographic elements in a scene includes the RGBA fileformat. By layering different objects from deepest to closest, i.e.,“stacking” and applying any alpha or mask of each element, andtranslating the closest objects the most horizontally for left and rightimages, a final pair of depth enhanced images is thus created based onthe input image and any computer-generated element metadata.

In another embodiment of this invention, these background elements andmotion elements are combined separately into single framerepresentations of multiple frames, as tiled frame sets or as a singleframe composite of all elements (i.e., including both motion andbackgrounds/foregrounds) that then becomes a visual reference databasefor the computer controlled application of masks within a sequencecomposed of a multiplicity of frames. Each pixel address within thereference visual database corresponds to mask/lookup table addresswithin the digital frame and X, Y, Z location of subsequent “raw” framesthat were used to create the reference visual database. Masks areapplied to subsequent frames based on various differentiating imageprocessing methods such as edge detection combined with patternrecognition and other sub-mask analysis, aided by operator segmentedregions of interest from reference objects or frames, and operatordirected detection of subsequent regions corresponding to the originalregion of interest. In this manner, the gray scale actively determinesthe location and shape of each mask (and corresponding color lookup fromframe to frame for colorization projects or depth information fortwo-dimensional to three-dimensional conversion projects) that isapplied in a keying fashion within predetermined and operator-controlledregions of interest.

Camera Pan Background and Static Foreground Elements: Stationaryforeground and background elements in a plurality of sequential imagescomprising a camera pan are combined and fitted together using a seriesof phase correlation, image fitting and focal length estimationtechniques to create a composite single frame that represents the seriesof images used in its construction. During the process of thisconstruction the motion elements are removed through operator adjustedglobal placement of overlapping sequential frames.

For colorization projects, the single background image representing theseries of camera pan images is color designed using multiple colortransform look up tables limited only by the number of pixels in thedisplay. This allows the designer to include as much detail as desiredincluding air brushing of mask information and other mask applicationtechniques that provide maximum creative expression. For depthconversion projects, (i.e., two-dimensional to three-dimensional movieconversion for example), the single background image representing theseries of camera pan images may be utilized to set depths of the variousitems in the background. Once the background color/depth design iscompleted the mask information is transferred automatically to all theframes that were used to create the single composited image. In thismanner, color or depth is performed once per multiple images and/orscene instead of once per frame, with color/depth informationautomatically spread to individual frames via embodiments of theinvention. Masks from colorization projects may be combined or groupedfor depth conversion projects since the colorization masks may containmore sub-areas than a depth conversion mask. For example, for acoloration project, a person's face may have several masks applied toareas such as lips, eyes, hair, while a depth conversion project mayonly require an outline of the person's head or an outline of a person'snose, or a few geometric shape sub-masks to which to apply depth. Masksfrom a colorization project can be utilized as a starting point for adepth conversion project since defining the outlines of humanrecognizable objects by itself is time consuming and can be utilized tostart the depth conversion masking process to save time. Anycomputer-generated elements at the background level may be applied tothe single background image.

In one or more embodiments of the invention, image offset informationrelative to each frame is registered in a text file during the creationof the single composite image representing the pan and used to apply thesingle composite mask to frames used to create the composite image.

Since the foreground moving elements have been masked separately priorto the application of the background mask, the background maskinformation is applied wherever there is no pre-existing maskinformation.

Static Camera Scenes With and Without Film Weave, Minor Camera Followingand Camera Drift: In scenes where there is minor camera movement or filmweave resulting from the sprocket transfer from 35 mm or 16 mm film todigital format, the motion objects are first fully masked using thetechniques listed above. All frames in the scene are then processedautomatically to create a single image that represents both the staticforeground elements and background elements, eliminating all maskedmoving objects where they both occlude and expose the background.

Wherever the masked moving object exposes the background or foreground,the instance of background and foreground previously occluded is copiedinto the single image with priority and proper offsets to compensate forcamera movement. The offset information is included in a text fileassociated with each single representation of the background so that theresulting mask information can be applied to each frame in the scenewith proper mask offsets.

The single background image representing the series of static cameraframes is color designed using multiple color transform look up tableslimited only by the number of pixels in the display. Where the motionelements occlude the background elements continuously within the seriesof sequential frames they are seen as black figure that are ignored andmasked over. The black objects are ignored in colorization-only projectsduring the masking operation because the resulting background mask islater applied to all frames used to create the single representation ofthe background only where there is no pre-existing mask. If backgroundinformation is created for areas that are never exposed, then this datais treated as any other background data that is spread through a seriesof images based on the composite background. This allows forminimization of artifacts or artifact-free two-dimensional tothree-dimensional conversion since there is never any need to stretchobjects or extend pixels as for missing data, since image data that hasbeen generated to be believable to the human observer is generated forand then taken from the occluded areas when needed during the depthconversion process. Hence for motion elements and computer-generatedelements, realistic looking data can be utilized for areas behind theseelements when none exists. This allows the designer to include as muchdetail as desired including air brushing of mask information and othermask application techniques that provide maximum creative expression.Once the background color design is completed the mask information istransferred automatically to all the frames that were used to create thesingle composited image. For depth projects, the distance from thecamera to each item in the composite frame is automatically transferredto all the frames that were used to create the single composited image.By shifting masked background objects horizontally more or less, theirperceived depth is thus set in a secondary viewpoint frame thatcorresponds to each frame in the scene. This horizontal shifting mayutilize data generated by an artist for the occluded or alternatively,areas where no image data exists yet for a second viewpoint may bemarked in one or more embodiments of the invention using a user definedcolor that allows for the creation missing data to ensure that noartifacts occur during the two-dimension to three-dimension conversionprocess. Any technique known may be utilized in embodiments of theinvention to cover areas in the background where unknown data exists,i.e., (as displayed in some color that shows where the missing dataexists) that may not be borrowed from another scene/frame for example byhaving artists create complete backgrounds or smaller occluded areaswith artist drawn objects. After assigning depths to objects in thecomposite background, or by importing depths associated withcomputer-generated elements at the background depth, a second viewpointimage may be created for each image in a scene in order to produce astereoscopic view of the movie, for example a left eye view where theoriginal frames in the scene are assigned to the right eye viewpoint,for example by translating foreground objects horizontally for thesecond viewpoint, or alternatively by translating foreground objectshorizontally left and right to create two viewpoints offset from theoriginal viewpoint.

One or more tools employed by the system enable real-time editing of 3Dimages without re-rendering for example to alter layers/colors/masksand/or remove artifacts and to minimize or eliminate iterative workflowpaths back through different workgroups by generating translation filesthat can be utilized as portable pixel-wise editing files. For example,a mask group takes source images and creates masks for items, areas orhuman recognizable objects in each frame of a sequence of images thatmake up a movie. The depth augmentation group applies depths, and forexample shapes, to the masks created by the mask group. When renderingan image pair, left and right viewpoint images and left and righttranslation files may be generated by one or more embodiments of theinvention. The left and right viewpoint images allow 3D viewing of theoriginal 2D image. The translation files specify the pixel offsets foreach source pixel in the original 2D image, for example in the form ofUV or U maps. These files are generally related to an alpha mask foreach layer, for example a layer for an actress, a layer for a door, alayer for a background, etc. These translation files, or maps are passedfrom the depth augmentation group that renders 3D images, to the qualityassurance workgroup. This allows the quality assurance workgroup (orother workgroup such as the depth augmentation group) to performreal-time editing of 3D images without re-rendering for example to alterlayers/colors/masks and/or remove artifacts such as masking errorswithout delays associated with processing time/re-rendering and/oriterative workflow that requires such re-rendering or sending the masksback to the mask group for rework, wherein the mask group may be in athird world country with unskilled labor on the other side of the globe.In addition, when rendering the left and right images, i.e., 3D images,the Z depth of regions within the image, such as actors for example, mayalso be passed along with the alpha mask to the quality assurance group,who may then adjust depth as well without re-rendering with the originalrendering software. This may be performed for example with generatedmissing background data from any layer so as to allow “downstream”real-time editing without re-rendering or ray-tracing for example.Quality assurance may give feedback to the masking group or depthaugmentation group for individuals so that these individuals may beinstructed to produce work product as desired for the given project,without waiting for, or requiring the upstream groups to rework anythingfor the current project. This allows for feedback yet eliminatesiterative delays involved with sending work product back for rework andthe associated delay for waiting for the reworked work product.Elimination of iterations such as this provide a huge savings inwall-time, or end-to-end time that a conversion project takes, therebyincreasing profits and minimizing the workforce needed to implement theworkflow.

General Workflow for the Review Phase

Regardless of the type of project work performed on a given asset, theasset is reviewed for example using an interface that couples with theproject management database to enable the viewing of work product.Generally, editorial role based users use the interface most, artistsand stereographers less and lead artists the least. The review notes andimages may be viewed simultaneously, for example with a clear backgroundsurrounding text that is overlaid on the image or scene to enable rapidreview and feedback by a given worker having a particular role. Otherimprovements to the project management database include ratings orartists and difficulty of the asset. These fields enable workers to berated and projected costs to be forecast when bidding projects, which isunknown in the field of motion picture project planning

General Workflow for the Archive and Shipping Phase

Asset managers may delete and/or compress all assets that may beregenerated, which can save hundreds of terabytes of disk space for atypical motion picture. This enables an enormous savings in disk drivehardware purchases and is unknown in the art.

One or more embodiments of the system may be implemented with a computerand a database coupled with the computer. Any computer architecturehaving any number of computers, for example coupled via a computercommunication network is in keeping with the spirit of the invention.The database coupled with the computer includes at least a projecttable, shot table, task table and timesheet table. The project tablegenerally includes project identifier and description of a projectrelated to a motion picture. The shot table generally includes a shotidentifier and references a plurality of images with a starting framevalue and an ending frame value wherein the plurality of images areassociated with the motion picture that is associated with the project.The shot table generally includes at least one shot having statusrelated to progress of work performed on the shot. The task tablegenerally references the project using a project identifier in alsolocated in the project table. The task table generally includes at leastone task which generally includes a task identifier and an assignedworker, e.g., artist, and which may also include a context settingassociated with a type of task related to motion picture work selectedfrom region design, setup, motion, composite, and review. The at leastone task generally includes a time allocated to complete the at leastone task. The timesheet item table generally references the projectidentifier in the project table and the task identifier in the tasktable. The task table generally includes at least one timesheet itemthat includes a start time and an end time. In one or more embodimentsof the invention, the computer is configured to present a first displayconfigured to be viewed by an artist that includes at least one dailyassignment having a context, project, shot and a status input that isconfigured to update the status in the task table and a timer input thatis configured to update the start time and the end time in the timesheetitem table. The computer is generally configured to present a seconddisplay configured to be viewed by a coordinator or “production” worker,i.e., production that includes a search display having a context,project, shot, status and artist and wherein the second display furtherincludes a list of a plurality of artists and respective status andactuals based on time spent in the at least one timesheet item versusthe time allocated per the at least one task associated with the atleast one shot. The computer is generally also configured to present athird display configured to be viewed by an editor that includes anannotation frame configured to accept commentary or drawing or bothcommentary and drawing on the at least one of said plurality of imagesassociated with the at least one shot. One or more embodiments of thecomputer may be configured to provide the third display configured to beviewed by an editor that includes an annotation overlaid on at least oneof the plurality of images. This capability provides information on onedisplay that has generally required three workers to integrate in knownsystems, and which is novel in and of itself.

Embodiments of the database may also include a snapshot table whichincludes an snapshot identifier and search type and which includes asnapshot of the at least one shot, for example that includes a subset ofthe at least one shot wherein the snapshot is cached on the computer toreduce access to the shot table. Embodiments may also include othercontext settings for other types of task categories, for example sourceand cleanup related tasks. Any other context settings or values that arerelated to motion picture work may also be included in keeping with thespirit of the invention. Embodiments of the database may also include anasset request table that includes an asset request identifier and shotidentifier that may be utilized to request work on assets or assetsthemselves to be worked on or created by other workers for example.Embodiments of the database may also include a request table thatincludes an mask request identifier and shot identifier and that may beutilized to request any type of action by another worker for example.Embodiments of the database may also include a note table which includesa note identifier and that references the project identifier and thatincludes at least one note related to at least one of the plurality ofimages from the motion picture. Embodiments of the database may alsoinclude a delivery table that includes a delivery identifier thatreferences the project identifier and which includes information relatedto delivery of the motion picture.

One or more embodiments of the computer are configured to accept arating input from production or the editor based on work performed bythe artist, optionally in a blind manner in which the reviewer does notknow the identity of the artist in order to prevent favoritism forexample. One or more embodiments of the computer are configured toaccept a difficulty of the at least one shot and calculate a ratingbased on work performed by the artist and based on the difficulty of theshot and time spent on the shot. One or more embodiments of the computerare configured to accept a rating input from production or editorial,(i.e., an editor worker) based on work performed by the artist, or,accept a difficulty of the at least one shot and calculate a ratingbased on work performed by the artist and based on the difficulty of theshot and time spent on the shot, and, signify an incentive with respectto the artist based on the rating accepted by the computer or calculatedby the computer. One or more embodiments of the computer are configuredto estimate remaining cost based on the actuals that are based on totaltime spent for all of the at least one tasks associated with all of theat least one shot in the project versus time allocated for all of the atleast one tasks associated with all of the at least one shot in theproject. One or more embodiments of the computer are configured tocompare the actuals associated with a first project with actualsassociated with a second project and signify at least one worker to beassigned from the first project to the second project based on at leastone rating of the first worker that is assigned to the first project.One or more embodiments of the computer are configured to analyze aprospective project having a number of shots and estimated difficultyper shot and based on actuals associated with the project, calculate apredicted cost for the prospective project. One or more embodiments ofthe computer are configured to analyze a prospective project having anumber of shots and estimated difficulty per shot and based on theactuals associated with a first previously performed project and asecond previously performed project that completed after the firstpreviously performed project, calculate a derivate of the actuals,calculate a predicted cost for the prospective project based on thederivative of the actuals. For example, as the process improves, toolsimprove and workers improve, the efficiency of work improves and thebudgeting and bid processes can take this into account by calculatinghow efficiency is changing versus time and use this rate of change topredict costs for a prospective project. One or more embodiments of thecomputer are configured to analyze the actuals associated with saidproject and divide completed shots by total shots associated with saidproject and present a time of completion of the project. One or moreembodiments of the computer are configured to analyze the actualsassociated with the project and divide completed shots by total shotsassociated with the project, present a time of completion of theproject, accept an input of at least one additional artist having arating, accept a number of shots in which to use the additional artist,calculate a time savings based on the at least one additional artist andthe number of shots, subtract the time savings from the time ofcompletion of the project and present an updated time of completion ofthe project. One or more embodiments of the computer are configured tocalculate amount of disk space that may be utilized to archive theproject and signify at least one asset that may be rebuilt from otherassets to avoid archival of the at least one asset. One or moreembodiments of the computer are configured to display an error messageif the artist is working with a frame number that is not current in theat least one shot. This may occur when fades, dissolves or other effectslengthen a particular shot for example wherein the shot contains framesnot in the original source assets.

Metadata Grouping Tool

In one or more embodiments of the invention, the motion picture projectmanagement system includes a multi-stage production pipeline system forthe motion picture projects. According to at least one embodiment aspreviously stated, the multi-stage production pipeline system includes acomputer and a database that includes a shot table with a shotidentifier associated with a plurality of images that are ordered intime and that make up a shot such that the shot table has a startingframe value and an ending frame value associated with each shot. In atleast one embodiment, the plurality of images are associated with amotion picture and wherein the database further includes metadataassociated with at least one shot or associated with regions within theplurality of images in the at least one shot, or both. In at least oneembodiment of the invention, the multi-stage production pipeline systemincludes a project table, such that the project table includes a projectidentifier and description of a project related to the motion picture.

In one or more embodiments, the computer may present a grouping toolinterface coupled with the computer and the database, wherein thegrouping tool may present user interface elements, accept input of themetadata and accept selected shots associated with the metadata via theuser interface elements. In at least one embodiment of the invention,the computer may one or more of store the metadata associated with theselected shots in the shot table, accept selected metadata to search theat least one shot, query the shot table with the selected metadataassociated with the at least one shot or the regions within theplurality of images in the at least one shot or both, and display a listof shots having the selected metadata. In one or more embodiments, thelist of shots may include at least one shot that is non-sequential intime in the motion picture with respect to another shot in the list ofshots. The computer may assign work tasks based on the list of shotswherein the list of shots includes the at least one shot that isnon-sequential in time in the motion picture with respect to anothershot in the list of shots.

In one or more embodiments, the computer may present a first display tobe viewed by a production worker that includes a search display with oneor more of a context, a project, a shot, the list of shots, a status andan artist, present a second display to be viewed by an artist thatincludes at least one daily assignment having a context, project andshot or the list of shots or both, and present a third display to beviewed by an editorial worker that includes an annotation frame toaccept commentary or drawing or both commentary and drawing on at leastone of the plurality of images associated with the at least one shot orthe list of shots or both. In one or more embodiments of the invention,the at least one shot or the list of shots, or both, include statusrelated to progress of work performed.

By way of one or more embodiments, the metadata associated with the atleast one shot is associated with at least one metadata category. In oneor more embodiments, the metadata category comprises one or more of alocale or location at which the shot was obtained, a subject thatappears in the shot wherein the subject is a person, place or thing, ashot framing associated with the shot wherein the shot framing includesone or more of a close up or CU, a mid shot or MS, wide shot or WS, andan extreme wide shot or XWS. In at least one embodiment, the groupingtool interface and the metadata categories may transition shot framingincluding any combination of CU, MS, SW and XWS, such that a CU maytransition to a MS, WS or XWS, and/or a MS may transition to a CU, WS orXWS, and/or a WS may transition to a CU, MS or XWS, and/or a XWS maytransition to a CU, MS or WS.

In one or more embodiments, the metadata associated with the at leastone shot is associated with a metadata category, wherein the metadatacategory includes one or more of a depth complexity associated with theshot and a clean plate complexity associated with the shot. In at leastone embodiment of the invention, the grouping tool interface may acceptat least one additional metadata category and additional metadata valuesassociated the metadata category.

By way one or more embodiments, the grouping tool interface may acceptan input to designate the at least one shot as a master shot associatedwith depth, key selects or clean plate or any combination thereof. Thisenables the at least one shot to be utilized as a benchmark for qualityor volume or to improve efficiency or both.

Embodiments enable a large studio workforce to work non-linearly on afilm while maintaining a unified vision driven by key creative figures.Thus, work product is more consistent, higher quality, faster, lessexpensive and enables reuse of project files, masks and other productionelements across projects since work is no longer constrained by shotorder when using embodiments of the invention.

In at least one embodiment of the invention, the grouping tool interfaceincludes a reference mask library tool with a plurality of referencemasks, wherein each of the list of shots share selected metadata. In oneor more embodiments, the reference mask library tool may present aninterface to accept a selection of one or more of the plurality ofreference masks to be utilized in shots in the list of shots that do notalready utilize the reference mask associated with the subject. This mayenable a worker to locate similar subjects in one or more additionalshots and use reference masks associated with the subject located to addmetadata to the one or more additional shots, without the need toreinvent metadata for each of those additional shots with the samesubject such as a person, place or thing. In one or more embodiments,each one of the plurality of reference masks may be a dedicated templateof the subject.

In one or more embodiments, the plurality of reference masks may beobtained from a second project associated with a second motion picturethat differs from the first motion picture. This allows the system tocreate different sequels and/or motions pictures with similar subjects,such as people, places or things, from the first motion picture, withsimilar metadata to enable time efficiency, consistency and accuracy.

By way of one or more embodiments, the grouping tool interface maydisplay a plurality of search results including the lists of shotsassociated with a plurality of respective selected metadata. Inaddition, in at least one embodiment of the invention, the grouping toolinterface may present a timeline of the plurality of images associatedwith the list of shots.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a plurality of feature film or television film framesrepresenting a scene or cut in which there is a single instance orperceptive of a background.

FIG. 2 shows an isolated background processed scene from the pluralityof frames shown in FIG. 1 in which all motion elements are removed usingvarious subtraction and differencing techniques. The single backgroundimage is then used to create a background mask overlay representingdesigner selected color lookup tables in which dynamic pixel colorsautomatically compensate or adjust for moving shadows and other changesin luminance.

FIG. 3 shows a representative sample of each motion object (M-Object) inthe scene receives a mask overlay that represents designer selectedcolor lookup tables in which dynamic pixel colors automaticallycompensate or adjust for moving shadows and other changes in luminanceas the M-Object moves within the scene.

FIG. 4 shows all mask elements of the scene are then rendered to createa fully colored frame in which M-Object masks are applied to eachappropriate frame in the scene followed by the background mask, which isapplied only where there is no pre-existing mask in a Boolean manner.

FIGS. 5A and 5B show a series of sequential frames loaded into displaymemory in which one frame is fully masked with the background (keyframe) and ready for mask propagation to the subsequent frames viaautomatic mask fitting methods.

FIGS. 6A and 6B show the child window displaying an enlarged andscalable single image of the series of sequential images in displaymemory. The Child window enables the operator to manipulate masksinteractively on a single frame or in multiple frames during real timeor slowed motion.

FIGS. 7A and 7B shows a single mask (flesh) is propagated automaticallyto all frames in the display memory.

FIG. 8 shows all masks associated with the motion object are propagatedto all sequential frames in display memory.

FIG. 9A shows a picture of a face.

FIG. 9B shows a close up of the face in FIG. 9A wherein the “small dark”pixels shown in FIG. 9B are used to calculate a weighed index usingbilinear interpolation.

FIGS. 10A-D show searching for a Best Fit on the Error Surface: An errorsurface calculation in the Gradient Descent Search method involvescalculating mean squared differences of pixels in the square fit boxcentered on reference image pixel (x0, y0), between the reference imageframe and the corresponding (offset) location (x,y) on the search imageframe.

FIGS. 11A-C show a second search box derived from a descent down theerror surface gradient (evaluated separately), for which the evaluatederror function is reduced, possibly minimized, with respect to theoriginal reference box (evident from visual comparison of the boxes withthe reference box in FIGS. 10A, B, C and D).

FIG. 12 depicts the gradient component evaluation. The error surfacegradient is calculated as per definition of the gradient. Vertical andhorizontal error deviations are evaluated at four positions near thesearch box center position, and combined to provide an estimate of theerror gradient for that position 12.

FIG. 13 shows a propagated mask in the first sequential instance wherethere is little discrepancy between the underlying image data and themask data. The dress mask and hand mask can be clearly seen to be offrelative to the image data.

FIG. 14 shows that by using the automatic mask fitting routine, the maskdata adjusts to the image data by referencing the underlying image datain the preceding image.

FIG. 15 shows the mask data in later images within the sequence showmarked discrepancy relative to the underlying image data. Eye makeup,lipstick, blush, hair, face, dress and hand image data are all displacedrelative to the mask data.

FIG. 16 shows that the mask data is adjusted automatically based on theunderlying image data from the previous mask and underlying image data.

FIG. 17 shows the mask data from FIG. 16 is shown with appropriate colortransforms after whole frame automatic mask fitting. The mask data isadjusted to fit the underlying luminance pattern based on data from theprevious frame or from the initial key frame.

FIG. 18 shows polygons that are used to outline a region of interest formasking in frame one. The square polygon points snap to the edges of theobject of interest. Using a Bezier curve the Bezier points snap to theobject of interest and the control points/curves shape to the edges.

FIG. 19 shows the entire polygon or Bezier curve is carried to aselected last frame in the display memory where the operator adjusts thepolygon points or Bezier points and curves using the snap function whichautomatically snaps the points and curves to the edges of the object ofinterest.

FIG. 20 shows that if there is a marked discrepancy between the pointsand curves in frames between the two frames where there was an operatorinteractive adjustment, the operator will further adjust a frame in themiddle of the plurality of frames where there is maximum error of fit.

FIG. 21 shows that when it is determined that the polygons or Beziercurves are correctly animating between the two adjusted frames, theappropriate masks are applied to all frames.

FIG. 22 shows the resulting masks from a polygon or Bezier animationwith automatic point and curve snap to edges. The brown masks are thecolor transforms and the green masks are the arbitrary color masks.

FIG. 23 shows an example of two pass blending: The objective in two-passblending is to eliminate moving objects from the final blended mosaic.This can be done by first blending the frames so the moving object iscompletely removed from the left side of the background mosaic. As shownin FIG. 23, the character can is removed from the scene, but can stillbe seen in the right side of the background mosaic.

FIG. 24 shows the second pass blend. A second background mosaic is thengenerated, where the blend position and width is used so that the movingobject is removed from the right side of the final background mosaic. Asshown in FIG. 24, the character can is removed from the scene, but canstill be seen the left side of the background mosaic. In the second passblend as shown in FIG. 24, the moving character is shown on the left.

FIG. 25 shows the final background corresponding to FIGS. 23-24. Thetwo-passes are blended together to generate the final blended backgroundmosaic with the moving object removed from the scene. As shown in FIG.25, the final blended background with moving character is removed.

FIG. 26 shows an edit frame pair window.

FIG. 27 shows sequential frames representing a camera pan that areloaded into memory. The motion object (butler moving left to the door)has been masked with a series of color transform information leaving thebackground black and white with no masks or color transform informationapplied.

FIG. 28 shows six representative sequential frames of the pan above aredisplayed for clarity.

FIG. 29 shows the composite or montage image of the entire camera panthat was built using phase correlation techniques. The motion object(butler) included as a transparency for reference by keeping the firstand last frame and averaging the phase correlation in two directions.The single montage representation of the pan is color designed using thesame color transform masking techniques as used for the foregroundobject.

FIG. 30 shows that the sequence of frames in the camera pan after thebackground mask color transforms the montage has been applied to eachframe used to create the montage. The mask is applied where there is nopre-existing mask thus retaining the motion object mask and colortransform information while applying the background information withappropriate offsets.

FIG. 31 shows a selected sequence of frames in the pan for clarity afterthe color background masks have been automatically applied to the frameswhere there is no pre-existing masks.

FIG. 32 shows a sequence of frames in which all moving objects (actors)are masked with separate color transforms.

FIG. 33 shows a sequence of selected frames for clarity prior tobackground mask information. All motion elements have been fully maskedusing the automatic mask-fitting algorithm.

FIG. 34 shows the stationary background and foreground information minusthe previously masked moving objects. In this case, the singlerepresentation of the complete background has been masked with colortransforms in a manner similar to the motion objects. Note that outlinesof removed foreground objects appear truncated and unrecognizable due totheir motion across the input frame sequence interval, i.e., the blackobjects in the frame represent areas in which the motion objects(actors) never expose the background and foreground. The black objectsare ignored during the masking operation in colorization-only projectsbecause the resulting background mask is later applied to all framesused to create the single representation of the background only wherethere is no pre-existing mask. In depth conversion projects the missingdata area may be displayed so that image data may be obtained/generatedfor the missing data area so as to provide visually believable imagedata when translating foreground objects horizontally to generate asecond viewpoint.

FIG. 35 shows the sequential frames in the static camera scene cut afterthe background mask information has been applied to each frame withappropriate offsets and where there is no pre-existing mask information.

FIG. 36 shows a representative sample of frames from the static camerascene cut after the background information has been applied withappropriate offsets and where there is no pre-existing mask information.

FIGS. 37A-C show embodiments of the Mask Fitting functions, includingcalculate fit grid and interpolate mask on fit grid.

FIGS. 38A-B show embodiments of the extract background functions.

FIGS. 39A-C show embodiments of the snap point functions.

FIGS. 40A-C show embodiments of the bimodal threshold masking functions,wherein FIG. 40C corresponds to step 2.1 in FIG. 40A, namely “CreateImage of Light/Dark Cursor Shape” and FIG. 40B corresponds to step 2.2in FIG. 40A, namely “Apply Light/Dark shape to mask”.

FIGS. 41A-B show embodiments of the calculate fit value functions.

FIG. 42 shows two image frames that are separated in time by severalframes, of a person levitating a crystal ball wherein the variousobjects in the image frames are to be converted from two-dimensionalobjects to three-dimensional objects.

FIG. 43 shows the masking of the first object in the first image framethat is to be converted from a two-dimensional image to athree-dimensional image.

FIG. 44 shows the masking of the second object in the first image frame.

FIG. 45 shows the two masks in color in the first image frame allowingfor the portions associated with the masks to be viewed.

FIG. 46 shows the masking of the third object in the first image frame.

FIG. 47 shows the three masks in color in the first image frame allowingfor the portions associated with the masks to be viewed.

FIG. 48 shows the masking of the fourth object in the first image frame.

FIG. 49 shows the masking of the fifth object in the first image frame.

FIG. 50 shows a control panel for the creation of three-dimensionalimages, including the association of layers and three-dimensionalobjects to masks within an image frame, specifically showing thecreation of a Plane layer for the sleeve of the person in the image.

FIG. 51 shows a three-dimensional view of the various masks shown inFIGS. 43-49, wherein the mask associated with the sleeve of the personis shown as a Plane layer that is rotated toward the left and rightviewpoints on the right of the page.

FIG. 52 shows a slightly rotated view of FIG. 51.

FIG. 53 shows a slightly rotated view of FIG. 51.

FIG. 54 shows a control panel specifically showing the creation of asphere object for the crystal ball in front of the person in the image.

FIG. 55 shows the application of the sphere object to the flat mask ofthe crystal ball, that is shown within the sphere and as projected tothe front and back of the sphere to show the depth assigned to thecrystal ball.

FIG. 56 shows a top view of the three-dimensional representation of thefirst image frame showing the Z-dimension assigned to the crystal ballshows that the crystal ball is in front of the person in the scene.

FIG. 57 shows that the sleeve plane rotating in the X-axis to make thesleeve appear to be coming out of the image more.

FIG. 58 shows a control panel specifically showing the creation of aHead object for application to the person's face in the image, i.e., togive the person's face realistic depth without requiring a wire modelfor example.

FIG. 59 shows the Head object in the three-dimensional view, too largeand not aligned with the actual person's head.

FIG. 60 shows the Head object in the three-dimensional view, resized tofit the person's face and aligned, e.g., translated to the position ofthe actual person's head.

FIG. 61 shows the Head object in the three-dimensional view, with theY-axis rotation shown by the circle and Y-axis originating from theperson's head thus allowing for the correct rotation of the Head objectto correspond to the orientation of the person's face.

FIG. 62 shows the Head object also rotated slightly clockwise, about theZ-axis to correspond to the person's slightly tilted head.

FIG. 63 shows the propagation of the masks into the second and finalimage frame.

FIG. 64 shows the original position of the mask corresponding to theperson's hand.

FIG. 65 shows the reshaping of the mask, that can be performedautomatically and/or manually, wherein any intermediate frames get thetweened depth information between the first image frame masks and thesecond image frame masks.

FIG. 66 shows the missing information for the left viewpoint ashighlighted in color on the left side of the masked objects in the lowerimage when the foreground object, here a crystal ball is translated tothe right.

FIG. 67 shows the missing information for the right viewpoint ashighlighted in color on the right side of the masked objects in thelower image when the foreground object, here a crystal ball istranslated to the left.

FIG. 68 shows an anaglyph of the final depth enhanced first image frameviewable with Red/Blue 3-D glasses.

FIG. 69 shows an anaglyph of the final depth enhanced second and lastimage frame viewable with Red/Blue 3-D glasses, note rotation ofperson's head, movement of person's hand and movement of crystal ball.

FIG. 70 shows the right side of the crystal ball with fill mode “smear”,wherein the pixels with missing information for the left viewpoint,i.e., on the right side of the crystal ball are taken from the rightedge of the missing image pixels and “smeared” horizontally to cover themissing information.

FIG. 71 shows a mask or alpha plane, for an actor's upper torso and head(and transparent wings). The mask may include opaque areas shown asblack and transparent areas that are shown as grey areas.

FIG. 72 shows an occluded area, that corresponds to the actor of FIG.71, and that shows an area of the background that is never exposed inany frame in a scene. This may be a composite background for example.

FIG. 73 shows the occluded area artistically rendered to generate acomplete and realistic background for use in two-dimensional tothree-dimensional conversion, so as to enable an artifact-freeconversion.

FIG. 73A shows the occluded area partially drawn or otherwise renderedto generate just enough of a realistic looking background for use inminimizing artifacts two-dimensional to three-dimensional conversion.

FIG. 74 shows a light area of the shoulder portion on the right side ofFIG. 71 that represents a gap where stretching (as is also shown in FIG.70) would be used when shifting the foreground object to the left tocreate a right viewpoint. The dark portion of the figure is taken fromthe background where data is available in at least one frame of a scene.

FIG. 75 shows an example of the stretching of pixels, i.e., smearing,corresponding to the light area in FIG. 74 without the use of agenerated background, i.e., if no background data is available for anarea that is occluded in all frames of a scene.

FIG. 76 shows a result of a right viewpoint without artifacts on theedge of the shoulder of the person wherein the dark area includes pixelsavailable in one or more frames of a scene, and generated data foralways-occluded areas of a scene.

FIG. 77 shows an example of a computer-generated element, here a robot,which is modeled in three-dimensional space and projected as atwo-dimensional image. If metadata such as alpha, mask, depth or anycombination thereof exists, the metadata can be utilized to speed theconversion process from two-dimensional image to a pair oftwo-dimensional images for left and right eye for three-dimensionalviewing.

FIG. 78 shows an original image separated into a background andforeground elements, (mountain and sky in the background and soldiers inthe bottom left also see FIG. 79) along with the imported color anddepth of the computer-generated element, i.e., the robot with depthautomatically set via the imported depth metadata. As shown in thebackground, any area that is covered for the scene can be artisticallyrendered for example to provide believable missing data, as is shown inFIG. 73 based on the missing data of FIG. 73A, which results in artifactfree edges as shown in FIG. 76 for example.

FIG. 79 shows masks associated with the photograph of soldiers in theforeground to apply depth to the various portions of the soldiers thatlie in depth in front of the computer-generated element, i.e., therobot. The dashed lines horizontally extending from the mask areas showhorizontal translation of the foreground objects takes place and whereimported metadata can be utilized to accurately auto-correctover-painting of depth or color on the masked objects when metadataexists for the other elements of a movie. For example, when an alphaexists for the objects that occur in front of the computer-generatedelements. One type of file that can be utilized to obtain mask edge datais a file with alpha file and/or mask data such as an RGBA file.

FIG. 80 shows an imported alpha layer which can also be utilized as amask layer to limit the operator defined, and potentially less accuratemasks used for applying depth to the edges of the three soldiers A, Band C. In addition, a computer-generated element for dust can beinserted into the scene along the line annotated as “DUST”, to augmentthe reality of the scene.

FIG. 81 shows the result of using the operator-defined masks withoutadjustment when overlaying a motion element such as the soldier on thecomputer-generated element such as the robot. Through use of the alphametadata of FIG. 80 applied to the operated-defined mask edges of FIG.79, artifact free edges on the overlapping areas is thus enabled.

FIG. 82 shows a source image to be depth enhanced and provided alongwith left and right translation files and alpha masks so that downstreamworkgroups may perform real-time editing of 3D images withoutre-rendering for example to alter layers/colors/masks and/or removeand/or or adjust depths without iterative workflow paths back to theoriginal workgroups.

FIG. 83 shows masks generated by the mask workgroup for the applicationof depth by the depth augmentation group, wherein the masks areassociated with objects, such as for example human recognizable objectsin the source image of FIG. 82.

FIG. 84 shows areas where depth is applied generally as darker fornearer objects and lighter for objects that are further away.

FIG. 85A shows a left UV map containing translations or offsets in thehorizontal direction for each source pixel.

FIG. 85B shows a right UV map containing translations or offsets in thehorizontal direction for each source pixel.

FIG. 85C shows a black value shifted portion of the left UV map of FIG.85A to show the subtle contents therein.

FIG. 85D shows a black value shifted portion of the right UV map of FIG.85B to show the subtle contents therein.

FIG. 86A shows a left U map containing translations or offsets in thehorizontal direction for each source pixel.

FIG. 86B shows a right U map containing translations or offsets in thehorizontal direction for each source pixel.

FIG. 86C shows a black value shifted portion of the left U map of FIG.86A to show the subtle contents therein.

FIG. 86D shows a black value shifted portion of the right U map of FIG.86B to show the subtle contents therein.

FIG. 87 shows known uses for UV maps, wherein a three-dimensional modelis unfolded so that an image in UV space can be painted onto the 3Dmodel using the UV map.

FIG. 88 shows a disparity map showing the areas where the differencebetween the left and right translation maps is the largest.

FIG. 89 shows a left eye rendering of the source image of FIG. 82.

FIG. 90 shows a right eye rendering of the source image of FIG. 82.

FIG. 91 shows an anaglyph of the images of FIG. 89 and FIG. 90 for usewith Red/Blue glasses.

FIG. 92 shows an image that has been masked and is in the process ofdepth enhancement for the various layers.

FIG. 93 shows a UV map overlaid onto an alpha mask associated with theactress shown in FIG. 92 which sets the translation offsets in theresulting left and right UV maps based on the depth settings of thevarious pixels in the alpha mask.

FIG. 94 shows a workspace generated for a second depth enhancementprogram, or compositing program such as NUKE®, i.e., generated for thevarious layers shown in FIG. 92, i.e., left and right UV translationmaps for each of the alphas wherein the workspace allows for qualityassurance personnel (or other work groups) to perform real-time editingof 3D images without re-rendering for example to alterlayers/colors/masks and/or remove artifacts or otherwise adjust masksand hence alter the 3D image pair (or anaglyph) without iterativelysending fixes to any other workgroup.

FIG. 95 shows a workflow for iterative corrective workflow.

FIG. 96 shows an embodiment of the workflow enabled by one or moreembodiments of the system in that each workgroup can perform real-timeediting of 3D images without re-rendering for example to alterlayers/colors/masks and/or remove artifacts and otherwise correct workproduct from another workgroup without iterative delays associated withre-rendering/ray-tracing or sending work product back through theworkflow for corrections.

FIG. 97 illustrates and architectural view of an embodiment of theinvention.

FIG. 98 illustrates an annotated view of a session manager windowutilized to define a plurality of images to work on or assign work to.

FIG. 99 illustrates a view of the production display showing a project,shots and tasks related to the selected shot, along with status for eachtask context associated with the shot.

FIG. 100 illustrates a view of the actuals associated with a particularshot within a project for each task context associated with the shotwherein “under bid” task actuals are shown in a first manner and tasksthat are within a predefined percentage within the bid amount are shownin a second manner while tasks that are over bid are shown in a thirdmanner.

FIG. 101 illustrates the amount of disk space that may be saved bydeleting files which can be reconstructed from other files, for exampleafter completion of a project to save disk drive expenditures.

FIG. 102 illustrates a view of an artist display showing the taskcontext, project, shot, status, tools, start time button, check-ininput, rendering input, internal shot review input, meal,start/time/stop, review and submit inputs.

FIG. 103 illustrates an annotated view of the menu bar of the artistdisplay.

FIG. 104 illustrates an annotated view of the task row of the artistdisplay.

FIG. 105 illustrates an annotated view of the main portion of the userinterface of the artist display.

FIG. 106 illustrates a build timeline display for the artist display forcreating a timeline to work on.

FIG. 107 illustrates a browse snapshots display for the artist displaythat enables an artist to view the snapshot of a shot or otherwise cacheimportant information related to a shot so that the database does nothave to field requests for often utilized data.

FIG. 108 illustrates an artist actual window showing the actuals of timespent on tasks, for example versus time allocated for the tasks, withdropdown menus for specific timesheets.

FIG. 109 illustrates a notes display for the artist display that enablesartists to enter a note related to the shot.

FIG. 110 illustrates a check in display for the artist display thatenables work to be checked in after work on the shot is complete.

FIG. 111 illustrates a view of an editorial display showing the project,filter inputs, timeline inputs and search results showing the shots inthe main portion of the window along with the context of work andassigned worker.

FIG. 112 illustrates a view of the session manager display of theeditorial display for selecting shots to review.

FIG. 113 illustrates a view of the advanced search display of theeditorial display.

FIG. 114 illustrates a view of the simple search display of theeditorial display.

FIG. 115 illustrates a view of the reviewing pane for a shot alsoshowing integrated notes and/or snapshot information in the same frame.

FIG. 116 illustrates a view of the timelines to select for review and/orcheckin after modification.

FIG. 117 illustrates the annotations added to a frame for feedback usingthe tool of FIG. 117.

FIG. 118 illustrates an overview of the grouping tool interfaceaccording to one or more embodiments.

FIG. 119 illustrates a view of the grouping tool interface with metadatatags and shot tables according to one or more embodiments.

FIG. 120 illustrates a view of the grouping tool interface with shotsthat share metadata characteristics according to one or moreembodiments.

FIG. 121 illustrates a view of the grouping tool interface with a keyselect shot and color code, depth complexity and color code and cleanplate complexity and color code according to one or more embodiments.

FIG. 122 illustrates another view of the grouping tool interfaceaccording to one or more embodiments.

FIG. 123 illustrates a close-up view of the grouping tool interfaceaccording to one or more embodiments.

FIG. 124 illustrates a close-up view of a shot of the grouping toolinterface according to one or more embodiments.

FIG. 125 illustrates a close-view of a plurality of shot to add metadatato according to one or more embodiments.

FIG. 126 illustrates an overall view of the grouping tool interface witha shot from a plurality of images with metadata.

DETAILED DESCRIPTION OF THE INVENTION

FIGS. 118-126 illustrate embodiments of the invention directed atnon-linear workflow. The interface is described at the bottom portion ofthe detailed description after the project management system portion andoverall production workflow is described.

FIG. 97 illustrates and architectural view of an embodiment of theproject management portion of the invention and system elements thatenable non-linear workflow on motion picture projects. One or moreembodiments of the system include computer 9702 and database 9701coupled with computer 9702. Any computer architecture having any numberof computers, for example coupled via a computer communication networkis in keeping with the spirit of the invention. Database 9701 coupledwith computer 9702 includes at least a project table, shot table, tasktable and timesheet table. The project table generally includes projectidentifier and description of a project related to a motion picture. Theshot table generally includes a shot identifier and references aplurality of images with a starting frame value and an ending framevalue wherein the plurality of images are associated with the motionpicture that is associated with the project. The shot table generallyincludes at least one shot having status related to progress of workperformed on the shot. The task table generally references the projectusing a project identifier in also located in the project table. Thetask table generally includes at least one task which generally includesa task identifier and an assigned worker, e.g., artist, and which mayalso include a context setting associated with a type of task related tomotion picture work selected from region design, setup, motion,composite, and review for example (or any other set of motion picturerelated types of tasks). The context setting may also imply or have adefault workflow so that region design flows into depth which flows intocomposite. This enables the system to assign the next type of task, orcontext that the shot is to have work performed on. The flow may belinear or may iterate back for rework for example. The at least one taskgenerally includes a time allocated to complete the at least one task.The timesheet item table generally references the project identifier inthe project table and the task identifier in the task table. The tasktable generally includes at least one timesheet item that includes astart time and an end time. The completion of tasks may in one or moreembodiments set the context of the task to the next task in sequence ina workflow and the system may automatically notify the next worker inthe workflow based on the next context of work to be performed and theworkers may work under different contexts as previously described. Inone or more embodiments contexts may have sub-contexts, i.e., regiondesign may be broken into masking and outsource masking while depth maybe broken into key frame and motion contexts, depending on the desiredworkflow for the specific type of operations to be performed on a motionpicture project.

Embodiments of the database may also include a snapshot table whichincludes a snapshot identifier and search type and which includes asnapshot of the at least one shot, for example that includes a subset ofthe at least one shot wherein the snapshot is cached on the computer toreduce access to the shot table. Other embodiments of the snapshot tablekeep track of the resources on the network, stores information about theresources and track versioning of the resource. Embodiments may alsoinclude other context settings for other types of task categories, forexample source and cleanup related tasks. Any other context settings orvalues that are related to motion picture work may also be included inkeeping with the spirit of the invention. Embodiments of the databasemay also include an asset request table that includes an asset requestidentifier and shot identifier that may be utilized to request work onassets or assets themselves to be worked on or created by other workersfor example. Embodiments of the database may also include a requesttable that includes an mask request identifier and shot identifier andthat may be utilized to request any type of action by another worker forexample. Embodiments of the database may also include a note table whichincludes a note identifier and that references the project identifierand that includes at least one note related to at least one of theplurality of images from the motion picture. Embodiments of the databasemay also include a delivery table that includes a delivery identifierthat references the project identifier and which includes informationrelated to delivery of the motion picture.

One or more embodiments of the database may utilize a schema as followsor any other schema that is capable of supporting the functionality ofthe invention as specifically programmed in computer 9702 and asdescribed in any combination or sub-combination as follows so long asmotion picture project management may be performed as detailed herein orto in any other way better manage motion picture projects using theexemplary specifications herein:

Project Table

unique project identifier, project code (text name), title of motionpicture, type of project (test or for hire), last database update dateand time, status (retired or active), last version update of thedatabase, project type (colorization, effects, 2D->3D conversion,feature film, catalogue), lead worker, review drive (where the reviewshots are stored).

Task Table

unique task identifier, assigned worker, description (what to do),status (ingested, waiting, complete, returned, approved), bid startdate, bid end date, bid duration, actual start date, actual end date,priority, context (stacking, asset, motion, motion visual effects,outsource, cleanup, alpha generation, composite, masking, clean plate,setup, keyframe, quality control), project code in project table,supervisor or production or editorial worker, time spent per process.

Snapshot Table

unique snapshot identifier, search type (which project the search isfor), description (notes related to the shot), login (worker associatedwith the snapshot), timestamp, context (setup, cleanup, motion,composite . . . ), version of the snapshot on the shot, snapshot type(directory, file, information, review), project code, review sequencedata (where the data is stored on the network), asset name (alpha, mask,. . . ), snapshots used (codes of other snapshots used to make thissnapshot), checkin path (path to where the data was checked in from),tool version, review date, archived, rebuildable (true or false), sourcedelete, source delete date, source delete login.

Note Table

unique note identifier, project code, search type, search id, login(worker id), context (composite, review, motion, editorial . . . ),timestamp, note (text description of note associated with set of imagesdefined by search).

Delivery Table

unique deliver identifier, login (of worker), timestamp, status (retiredor not), delivery method (how it was delivered), description (what typeof media used to deliver project), returned (true or false), drive(serial number of the drive), case (serial number on the case), deliverydate, project identifier, client (text name of client), producer (nameof producer).

Delivery_Item Table

unique delivery item identifier, timestamp, delivery code, projectidentifier, file path (where delivery item is stored).

Timesheet Table

time sheet unique identifier, login (worker), timestamp, total time,timesheet approver, start time, end time, meal 1 (half hour break starttime), meal 2 (half hour break start time), status (pending orapproved).

Timesheet_Item Table

timesheet item unique identifier, login (worker), timestamp, context(region design, composite, rendering, motion, management, mask cleanup,training, cleanup, admin), project identifier, timesheet identifier,start time, end time, status (pending or approved), approved by(worker), task identifier.

Sequence Table

sequence unique identifier, login (worker that defined sequence),timestamp, shot order (that makes up the sequence).

Shot Table

shot unique identifier, login (worker that defined shot), timestamp,shot status (in progress, final, final client approval), client status(composite in progress, depth client review, composite client review,final), description (text description of shot, e.g., 2 planes flying byeach other), first frame number, last frame number, number of frames,assigned worker, region design, depth completion date, depth workerassigned, composite supervisor, composite lead worker, compositecompletion date.

Asset_Request Table

asset unique identifier, timestamp, asset worker assigned, status(pending or resolved), shot identifier, problem description in text,production worker, lead worker assigned, priority, due date.

Mask_Request Table

mask request unique identifier, login (worker making mask request),timestamp, depth artist, depth lead, depth coordinator or productionworker, masking issues, masks (versions with issue related to maskrequest), source used, due date, rework notes.

In one or more embodiments of the invention, the computer is generallyconfigured to present a session manager to select a series of imageswithin a shot to work on and/or assign tasks to or to review. Thecomputer is generally configured to present a first display configuredto be viewed by production that includes a search display having acontext, project, shot, status and artist and wherein the second displayfurther includes a list of a plurality of artists and respective statusand actuals based on time spent in the at least one timesheet itemversus the time allocated per the at least one task associated with theat least one shot.

The database may also include tables and fields for the support ofnon-linear workflow, including any type of metadata related to a sceneor regions within a image or several images that comprise a scene as isdiscussed at the end of this section.

FIG. 98 illustrates an annotated view of a session manager windowutilized to define a plurality of images to work on or assign work to orto review for example. Computer 9702 accepts inputs for the project,sequence (the motion picture or a trailer for example that utilizesshots in a particular sequence), along with the shot, mask version andvarious frame offsets and optionally downloads the images to the localcomputer for local processing for example. Each field on the figure isannotated with further details.

FIG. 99 illustrates a view of the production display showing a project,shots and tasks related to the selected shot, along with status for eachtask context associated with the shot. As shown on the left, the shotsthat make up a project may be selected, which redirects the main portionof the window to display information related to the selected shot(s)including tabs for “shot info”, “assets” used in the shot, frames toselect to view, notes, task information, actuals, check-in and dataintegrity. As shown in the main portion of the window, several taskcontexts are shown with their associated status, assigned workers, etc.Production utilizes this display and the computer accepts inputs fromthe production worker, e.g., user via this display to set tasks forartists along with allotted times. A small view of the shot is shown inthe lower left of the display to give production workers a view of themotion picture shot related to the tasks and context settings. Onepotential goal of the production role is to assign tasks and reviewstatus and actuals, while one potential goal of the artist is to use aset of tools to manipulate images, while one potential goal of thereviewer is to have a higher resolution image for review with integratedmeta data related to the shot and status thereof. In other words, thedisplays in the system are tailored for roles, yet integrated toprioritize the relevant information mainly of importance to that role inthe motion picture creation/conversion process.

FIG. 100 illustrates a view of the actuals associated with a particularshot within a project for each task context associated with the shotwherein “under bid” task actuals are shown in a first manner and tasksthat are within a predefined percentage within the bid amount are shownin a second manner while tasks that are over bid are shown in a thirdmanner.

FIG. 101 illustrates the amount of disk space that may be saved bydeleting files which can be reconstructed from other files, for exampleafter completion of a project to save disk drive expenditures. As shownrebuildable amounts for a project only partially complete can be in theterabyte range easily. By being able to securely delete rebuildableassets and compress other assets after a project is over, huge amountsof disk drive costs may be saved. In one or more embodiments thecomputer accesses the database and determines which resources depend onother resources and whether they can be compressed and with what generalcompression ratio that may be calculated in advance and/or based onother projects for example. The computer then calculates the totalstorage and the amount of storage that may be freed up by compressionand/or regeneration of resources and displays the information on acomputer display for example.

The computer is also generally configured to present a second displayconfigured to be viewed by an artist that includes at least one dailyassignment having a context, project, shot and a status input that isconfigured to update the status in the task table and a timer input thatis configured to update the start time and the end time in the timesheetitem table.

FIG. 102 illustrates a view of an artist display showing the taskcontext, project, shot, status, tools, start time button, check-ininput, rendering input, internal shot review input, meal,start/time/stop, review and submit inputs. This enables motion picturerelated tasks to be viewed and updated as work on a project progresses,which gives production a specific motion picture project managementrelated view into the status of a conversion and/or special effectsmovie project.

FIG. 103 illustrates an annotated view of the menu bar of the artistdisplay. The menu bar is shown at the top left portion of the display inFIG. 102.

FIG. 104 illustrates an annotated view of the task row of the artistdisplay. This annotated view shows only one row, however multiple rowsmay be displayed as per FIG. 102.

FIG. 105 illustrates an annotated view of the main portion of the userinterface of the artist display of FIG. 102.

FIG. 106 illustrates a build timeline display for the artist display forcreating a timeline to work on.

FIG. 107 illustrates a browse snapshots display for the artist displaythat enables an artist to view the snapshot of a shot or otherwise cacheimportant information related to a shot so that the database does nothave to field requests for often utilized data. The snapshot keeps trackof the locations of various files associated with a shot, and keepstrack of other information related to a work product related to theshot, i.e., source, masks, resolution, file type. In addition, thesnapshot keeps track of versioning of the various files and file typesand optionally the versions of the tools utilized to work on the variousfiles.

FIG. 108 illustrates an artist actual window showing the actuals of timespent on tasks, for example versus time allocated for the tasks, withdropdown menus for specific timesheets.

FIG. 109 illustrates a notes display for the artist display that enablesartists to enter a note related to the shot.

FIG. 110 illustrates a check in display for the artist display thatenables work to be checked in after work on the shot is complete.

The computer is generally also configured to present a third displayconfigured to be viewed by an editor, i.e., an editorial worker, thatincludes an annotation frame configured to accept commentary or drawingor both commentary and drawing on the at least one of said plurality ofimages associated with the at least one shot. One or more embodiments ofthe computer may be configured to provide the third display configuredto be viewed by an editor that includes an annotation overlaid on atleast one of the plurality of images. This capability providesinformation on one display that has generally required three workers tointegrate in known systems, and which is novel in and of itself.

FIG. 111 illustrates a view of an editorial display showing the project,filter inputs, timeline inputs and search results showing the shots inthe main portion of the window along with the context of work andassigned worker.

FIG. 112 illustrates a view of the session manager display of theeditorial display for selecting shots to review.

FIG. 113 illustrates a view of the advanced search display of theeditorial display.

FIG. 114 illustrates a view of the simple search display of theeditorial display.

FIG. 115 illustrates a view of the reviewing pane for a shot alsoshowing integrated notes and/or snapshot information in the same frame.This view has in the past generally required three workers to create andsaves great amounts of time and speeds the review process greatly. Anytype of information may be overlaid onto an image to enable consolidateddisparate views of images and related data on one display.

FIG. 116 illustrates a view of the timelines to select for review and/orcheckin after modification.

FIG. 117 illustrates the annotations added to a frame for feedback usingthe tool of FIG. 117.

One or more embodiments of the computer are configured to accept arating input from production or editorial based on work performed by theartist, optionally in a blind manner in which the reviewer does not knowthe identity of the artist in order to prevent favoritism for example.One or more embodiments of the computer are configured to accept adifficulty of the at least one shot and calculate a rating based on workperformed by the artist and based on the difficulty of the shot and timespent on the shot. One or more embodiments of the computer areconfigured to accept a rating input from production or editorial basedon work performed by the artist, or, accept a difficulty of the at leastone shot and calculate a rating based on work performed by the artistand based on the difficulty of the shot and time spent on the shot, and,signify an incentive with respect to the artist based on the ratingaccepted by the computer or calculated by the computer. One or moreembodiments of the computer are configured to estimate remaining costbased on the actuals that are based on total time spent for all of theat least one tasks associated with all of the at least one shot in theproject versus time allocated for all of the at least one tasksassociated with all of the at least one shot in the project. One or moreembodiments of the computer are configured to compare the actualsassociated with a first project with actuals associated with a secondproject and signify at least one worker to be assigned from the firstproject to the second project based on at least one rating of the firstworker that is assigned to the first project. One or more embodiments ofthe computer are configured to analyze a prospective project having anumber of shots and estimated difficulty per shot and based on actualsassociated with the project, calculate a predicted cost for theprospective project. One or more embodiments of the computer areconfigured to analyze a prospective project having a number of shots andestimated difficulty per shot and based on the actuals associated with afirst previously performed project and a second previously performedproject that completed after the first previously performed project,calculate a derivate of the actuals, calculate a predicted cost for theprospective project based on the derivative of the actuals. For example,as the process improves, tools improve and workers improve, theefficiency of work improves and the budgeting and bid processes can takethis into account by calculating how efficiency is changing versus timeand use this rate of change to predict costs for a prospective project.One or more embodiments of the computer are configured to analyze theactuals associated with said project and divide completed shots by totalshots associated with said project and present a time of completion ofthe project. One or more embodiments of the computer are configured toanalyze the actuals associated with the project and divide completedshots by total shots associated with the project, present a time ofcompletion of the project, accept an input of at least one additionalartist having a rating, accept a number of shots in which to use theadditional artist, calculate a time savings based on the at least oneadditional artist and the number of shots, subtract the time savingsfrom the time of completion of the project and present an updated timeof completion of the project. One or more embodiments of the computerare configured to calculate amount of disk space that may be utilized toarchive the project and signify at least one asset that may be rebuiltfrom other assets to avoid archival of the at least one asset. One ormore embodiments of the computer are configured to display an errormessage if the artist is working with a frame number that is not currentin the at least one shot. This may occur when fades, dissolves or othereffects lengthen a particular shot for example wherein the shot containsframes not in the original source assets.

Overview of Various Motion Picture Workflows

Feature Film and TV series Data Preparation for Colorization/Depthenhancement: Feature films are tele-cined or transferred from 35 mm or16 mm film using a high resolution scanner such as a 10-bit SPIRITDATACINE® or similar device to HDTV (1920 by 1080 24P) or data-cined ona laser film scanner such as that manufactured by IMAGICA® Corp. ofAmerica at a larger format 2000 lines to 4000 lines and up to 16 bits ofgrayscale. The high resolution frame files are then converted tostandard digital files such as uncompressed TIP files or uncompressedTGA files typically in 16 bit three-channel linear format or 8 bit threechannel linear format. If the source data is HDTV, the 10-bit HDTV framefiles are converted to similar TIF or TGA uncompressed files at either16-bits or 8-bit per channel. Each frame pixel is then averaged suchthat the three channels are merged to create a single 16 bit channel or8 bit channel respectively. Any other scanning technologies capable ofscanning an existing film to digital format may be utilized. Currently,many movies are generated entirely in digital format, and thus may beutilized without scanning the movie. For digital movies that haveassociated metadata, for example for movies that make use ofcomputer-generated characters, backgrounds or any other element, themetadata can be imported for example to obtain an alpha and/or maskand/or depth for the computer-generated element on a pixel-by-pixel orsub-pixel-by-sub-pixel basis. One format of a file that containsalpha/mask and depth data is the RGBAZ file format, of which oneimplementation is the EXR file format.

Digitization Telecine and Format Independence Monochrome elements ofeither 35 or 16 mm negative or positive film are digitized at variousresolutions and bit depth within a high resolution film scanner such asthat performed with a SPIRIT DATACINE® by PHILIPS® and EASTMAN KODAK®which transfers either 525 or 625 formats, HDTV, (HDTV) 1280×720/60 Hzprogressive, 2K, DTV (ATSC) formats like 1920×1080/24 Hz/25 Hzprogressive and 1920×1080/48 Hz/50 Hz segmented frame or 1920×1080 501as examples. The invention provides improved methods for editing filminto motion pictures. Visual images are transferred from developedmotion picture film to a high definition video storage medium, which isa storage medium adapted to store images and to display images inconjunction with display equipment having a scan density substantiallygreater than that of an NTSC compatible video storage medium andassociated display equipment. The visual images are also transferred,either from the motion picture film or the high definition video storagemedium to a digital data storage format adapted for use with digitalnonlinear motion picture editing equipment. After the visual images havebeen transferred to the high definition video storage medium, thedigital nonlinear motion picture editing equipment is used to generatean edit decision list, to which the motion picture film is thenconformed. The high definition video storage medium is generally adaptedto store and display visual images having a scan density of at least1080 horizontal lines. Electronic or optical transformation may beutilized to allow use of visual aspect ratios that make full use of thestorage formats used in the method. This digitized film data as well asdata already transferred from film to one of a multiplicity of formatssuch as HDTV are entered into a conversion system such as the HDTV STILLSTORE® manufactured by AVICA® Technology Corporation. Such large scaledigital buffers and data converters are capable of converting digitalimage to all standard formats such as 1080i HDTV formats such as 720p,and 1080p/24. An Asset Management System server provides powerful localand server back ups and archiving to standard SCSI devices, C2-levelsecurity, streamlined menu selection and multiple criteria databasesearches.

During the process of digitizing images from motion picture film themechanical positioning of the film frame in the telecine machine suffersfrom an imprecision known as “film weave”, which cannot be fullyeliminated. However various film registration and ironing or flatteninggate assemblies are available such as that embodied in U.S. Pat. No.5,328,073, Film Registration and Ironing Gate Assembly, which involvesthe use of a gate with a positioning location or aperture for focalpositioning of an image frame of a strip film with edge perforations.Undersized first and second pins enter a pair of transversely alignedperforations of the film to register the image frame with the aperture.An undersized third pin enters a third perforation spaced along the filmfrom the second pin and then pulls the film obliquely to a referenceline extending between the first and second pins to nest against thefirst and second pins the perforations thereat and register the imageframe precisely at the positioning location or aperture. A pair offlexible bands extending along the film edges adjacent the positioninglocation moves progressively into incrementally increasing contact withthe film to iron it and clamp its perforations against the gate. Thepins register the image frame precisely with the positioning location,and the bands maintain the image frame in precise focal position.Positioning can be further enhanced following the precision mechanicalcapture of images by methods such as that embodied in U.S. Pat. No.4,903,131, Method For The Automatic Correction Of Errors In ImageRegistration During Film Scanning.

To remove or reduce the random structure known as grain within exposedfeature film that is superimposed on the image as well as scratches orparticles of dust or other debris which obscure the transmitted lightvarious algorithms will be used such as that embodied in U.S. Pat. No.6,067,125 Structure And Method For Film Grain Noise Reduction and U.S.Pat. No. 5,784,176, Method Of Image Noise Reduction Processing.

Reverse Editing of the Film Element Preliminary to Visual DatabaseCreation:

The digital movie is broken down into scenes and cuts. The entire movieis then processed sequentially for the automatic detection of scenechanges including dissolves, wipe-a-ways and cuts. These transitions arefurther broken down into camera pans, camera zooms and static scenesrepresenting little or no movement. All database references to the aboveare entered into an edit decision list (EDT) within the database basedon standard SMPTE time code or other suitable sequential namingconvention. There exists, a great deal of technologies for detectingdramatic as well as subtle transitions in film content such as:

U.S. Ser. No. 05/959,697 Sep. 28, 1999 Method And System For DetectingDissolve Transitions In A Video Signal

U.S. Ser. No. 05/920,360 Jul. 6, 1999 Method And System For DetectingFade Transitions In A Video Signal

U.S. Ser. No. 05/841,512 Nov. 24, 1998 Methods Of Previewing And EditingMotion Pictures

U.S. Ser. No. 05/835,163 Nov. 10, 1998 Apparatus For Detecting A Cut InA Video

U.S. Pat. No. 5,767,923 Jun. 16, 1998 Method And System For DetectingCuts In A Video Signal

U.S. Pat. No. 5,778,108 Jul. 6, 1996 Method And System For DetectingTransitional Markers Such As Uniform Fields In A Video Signal

U.S. Pat. No. 5,920,360 Jun. 7, 1999 Method And System For DetectingFade Transitions In A Video Signal

All cuts that represent the same content such as in a dialog between twoor more people where the camera appears to volley between the twotalking heads are combined into one file entry for later batchprocessing.

An operator checks all database entries visually to ensure that:

1. Scenes are broken down into camera moves

2. Cuts are consolidated into single batch elements where appropriate

3. Motion is broken down into simple and complex depending on occlusionelements, number of moving objects and quality of the optics (e.g.,softness of the elements, etc).

Pre-Production—Scene Analysis and Scene Breakdown for Reference Frame IDand Data Base Creation:

Files are numbered using sequential SMPTE time code or other sequentialnaming convention. The image files are edited together at 24-frame/secspeed (without field related 3/2 pull down which is used in standardNTSC 30 frame/sec video) onto a DVD using ADOBE® AFTER EFFECTS® orsimilar programs to create a running video with audio of the featurefilm or TV series. This is used to assist with scene analysis and scenebreakdown.

Scene and Cut Breakdown:

1. A database permits the entering of scene, cut, design, key frame andother critical data in time code format as well as descriptiveinformation for each scene and cut.

2. Each scene cut is identified relative to camera technique. Time codesfor pans, zooms, static backgrounds, static backgrounds with unsteady ordrifting camera and unusual camera cuts that require special attention.

3. Designers and assistant designers study the feature film for colorclues and color references or for the case of depth projects, the filmis studied for depth clues, generally for non-standard sized objects.Research is provided for color/depth accuracy where applicable. TheInternet for example may be utilized to determine the color of aparticular item or the size of a particular item. For depth projects,knowing the size of an object allows for the calculation of the depth ofan item in a scene for example. For depth projects related to convertingtwo-dimensional movies to three-dimensional movies where depth metadatais available for computer-generated elements within the movies, thedepth metadata can be scaled, or translated or otherwise normalized tothe coordinate system or units used for the background and motionelements for example.

4. Single frames from each scene are selected to serve as design frames.These frames are color designed or metadata is imported for depth and/ormask and/or alpha for computer-generated elements, or depth assignments(see FIGS. 42-70) are made to background elements or motion elements inthe frames to represent the overall look and feel of the feature film.Approximately 80 to 100 design frames are typical for a feature film.

5. In addition, single frames called key frames from each cut of thefeature film are selected that contain all the elements within each cutthat require color/depth consideration. There may be as many as 1,000key frames. These frames will contain all the color/depth transforminformation necessary to apply color/depth to all sequential frames ineach cut without additional color choices.

Color/Depth Selection:

Historical reference, studio archives and film analysis provides thedesigner with color references. Using an input device such as a mouse,the designer masks features in a selected single frame containing aplurality of pixels and assigns color to them using an HSL color spacemodel based on creative considerations and the grayscale and luminancedistribution underlying each mask. One or more base colors are selectedfor image data under each mask and applied to the particular luminancepattern attributes of the selected image feature. Each color selected isapplied to an entire masked object or to the designated features withinthe luminance pattern of the object based on the unique gray-scalevalues of the feature under the mask.

A lookup table or color transform for the unique luminance pattern ofthe object or feature is thus created which represent the color toluminance values applied to the object. Since the color applied to thefeature extends the entire range of potential grayscale values from darkto light the designer can insure that as the distribution of thegray-scale values representing the pattern change homogeneously intodark or light regions within subsequent frames of the movie such as withthe introduction of shadows or bright light, the color for each featurealso remains consistently homogeneous and correctly lighten or darkenwith the pattern upon which it is applied.

Depth can imported for computer-generated objects where metadata existsand/or can be assigned to objects and adjusted using embodiments of theinvention using an input device such as a mouse to assign objectsparticular depths including contour depths, e.g., geometric shapes suchas an ellipsoid to a face for example. This allows objects to appearnatural when converted to three-dimensional stereoscopic images. Forcomputer-generated elements, the imported depth and/or alpha and/or maskshape can be adjusted if desired. Assigning a fixed distance toforeground objects tends to make the objects appear as cut-outs, i.e.,flat. See also FIGS. 42-70.

Propagation of Mask Color Transform/Depth Information from One Frame toa Series of Subsequent Frames:

The masks representing designed selected color transforms/depth contoursin the single design frame are then copied to all subsequent frames inthe series of movie frames by one or more methods such as auto-fittingbezier curves to edges, automatic mask fitting based on Fast FourierTransforms and Gradient Descent Calculation tied to luminance patternsin a subsequent frame relative to the design frame or a successivepreceding frames, mask paint to a plurality of successive frames bypainting the object within only one frame, auto-fitting vector points toedges and copying and pasting individual masks or a plurality of masksto selected subsequent frames. In addition, depth information may be“tweened” to account for forward/backward motion or zooming with respectto the camera capture location. For computer-generated elements, thealpha and/or mask data is generally correct and may be skipped forreshaping processes since the metadata associated withcomputer-generated elements is obtained digitally from the originalmodel of an object and hence does not require adjustment in general.(See FIG. 37C, step 3710 for setting mask fit location to border of CGelement to potentially skip large amounts of processing in fitting masksin subsequent frames to reshape the edges to align a photographicelement). Optionally, computer-generated elements may be morphed orreshaped to provide special effects not originally in a movie scene.

Single Frame Set Design and Colorization:

In embodiments of the invention, camera moves are consolidated andseparated from motion elements in each scene by the creation of amontage or composite image of the background from a series of successiveframes into a single frame containing all background elements for eachscene and cut. The resulting single frame becomes a representation ofthe entire common background of a multiplicity of frames in a movie,creating a visual database of all elements and camera offset informationwithin those frames.

In this manner most set backgrounds can be designed and colorized/depthenhanced in one pass using a single frame montage. Each montage ismasked without regard to the foreground moving objects, which are maskedseparately. The background masks of the montage are then automaticallyextracted from the single background montage image and applied to thesubsequent frames that were used to create the single montage using allthe offsets stored in the image data for correctly aligning the masks toeach subsequent frame.

There is a basic formula in filmmaking that varies little within andbetween feature films (except for those films employing extensivehand-held or stabilized camera shots.) Scenes are composed of cuts,which are blocked for standard camera moves, i.e., pans, zooms andstatic or locked camera angles as well as combinations of these moves.Cuts are either single occurrences or a combination of cut-a-ways wherethere is a return to a particular camera shot such as in a dialogbetween two individuals. Such cut-a-ways can be considered a singlescene sequence or single cut and can be consolidate in oneimage-processing pass.

Pans can be consolidated within a single frame visual database usingspecial panorama stitching techniques but without lens compensation.Each frame in a pan involves:

1. The loss of some information on one side, top and/or bottom of theframe

2. Common information in the majority of the frame relative to theimmediately preceding and subsequent frames and

3. New information on the other side, top and/or bottom of the frame.

By stitching these frames together based on common elements withinsuccessive frames and thereby creating a panorama of the backgroundelements a visual database is created with all pixel offsets availablefor referencing in the application of a single mask overlay to thecomplete set of sequential frames.

Creation of a Visual Database:

Since each pixel within a single frame visual database of a backgroundcorresponds to an appropriate address within the respective “raw”(unconsolidated) frame from which it was created, any designerdetermined masking operation and corresponding masking lookup tabledesignation applied to the visual database will be correctly applied toeach pixel's appropriate address within the raw film frames that wereused to create the single frame composite.

In this manner, sets for each scene and cut are each represented by asingle frame (the visual database) in which pixels have either single ormultiple representations within the series of raw frames from which theywere derived. All masking within a single visual database frame willcreate a one-bit mask per region representation of an appropriate lookuptable that corresponds to either common or unique pixel addresses withinthe sequential frames that created the single composite frame. Theseaddress-defined masking pixels are applied to the full resolution frameswhere total masking is automatically checked and adjusted wherenecessary using feature, edge detection and pattern recognitionroutines. Where adjustments are required, i.e., where applied maskedregion edges do not correspond to the majority of feature edges withinthe gray scale image, a “red flag” exception comment signals theoperator that frame-by-frame adjustments may be necessary.

Single Frame Representation of Motion within Multiple Frames:

The differencing algorithm used for detecting motion objects willgenerally be able to differentiate dramatic pixel region changes thatrepresent moving objects from frame to frame. In cases where castshadows on a background from a moving object may be confused with themoving object the resulting masks will be assigned to a default alphalayer that renders that part of the moving object mask transparent. Insome cases an operator using one or more vector or paint tools willdesignate the demarcation between the moving object and cast shadow. Inmost cases however, the cast shadows will be detected as an extraneousfeature relative to the two key motion objects. In this invention castshadows are handled by the background lookup table that automaticallyadjusts color along a luminance scale determined by the spectrum oflight and dark gray scale values in the image.

Action within each frame is isolated via differencing or frame-to-framesubtraction techniques that include vector (both directional and speed)differencing (i.e., where action occurs within a pan) as well as machinevision techniques, which model objects and their behaviors. Differencepixels are then composited as a single frame (or isolated in a tilingmode) representing a multiplicity of frames thus permitting the operatorto window regions of interest and otherwise direct image processingoperations for computer controlled subsequent frame masking.

As with the set or background montage discussed above, action takingplace in multiple frames within a scene can be represented by a singleframe visual database in which each unique pixel location undergoesappropriate one bit masking from which corresponding lookup tables areapplied. However, unlike the set or background montage in which allcolor/depth is applied and designated within the single frame pass, thepurpose of creating an action composite visual data base is to window orotherwise designate each feature or region of interest that will receivea particular mask and apply region of interest vectors from one keyframe element to subsequent key frame elements thus provide operatorassistance to the computer processing that will track each region ofinterest.

During the design phase, masks are applied to designer designatedregions of interest for a single instance of a motion object appearingwithin the background (i.e., a single frame of action appears within thebackground or stitched composited background in the proper x, ycoordinates within the background corresponding to the single frame ofaction from which it was derived). Using an input device such as a mousethe operator uses the following tools in creating the regions ofinterest for masking. Alternatively, projects having associatedcomputer-generated element metadata may import and if necessary, scalethe metadata to the units utilized for depth in the project. Since thesemasks are digitally created, they can be assumed to be accuratethroughout the scene and thus the outlines and depths of thecomputer-generated areas may be ignored for reshaping operations.Elements that border these objects, may thus be more accurately reshapedsince the outlines of the computer-generated elements are taken ascorrect. Hence, even for computer-generated elements having the sameunderlying gray scale of a contiguous motion or background element, theshape of the mask at the junction can be taken to be accurate eventhough there is no visual difference at the junction. Again, see FIG.37C, step 3710 for setting mask fit location to border of CG element topotentially skip large amounts of processing in fitting masks insubsequent frames to reshape the edges to align a photographic element

1. A combination of edge detection algorithms such as standard Laplacianfilters and pattern recognition routines

2. Automatic or assisted closing of a regions

3. Automatic seed fill of selected regions

4. Bimodal luminance detection for light or dark regions

5. An operator-assisted sliding scale and other tools create a “bestfit” distribution index corresponding to the dynamic range of theunderlying pixels as well as the underlying luminance values, patternand weighted variables

6. Subsequent analysis of underlying gray scale, luminance, area,pattern and multiple weighting characteristics relative to immediatelysurrounding areas creating a unique determination/discrimination setcalled a Detector File.

In the pre-production key frame phase—The composited single, designmotion database described above is presented along with all subsequentmotion inclusive of selected key frame motion objects. All motioncomposites can be toggled on and off within the background or viewed inmotion within the background by turning each successive motion compositeon and off sequentially.

Key Frame Motion Object Creation: The operator windows all maskedregions of interest on the design frame in succession and directs thecomputer by various pointing instruments and routines to thecorresponding location (regions of interest) on selected key framemotion objects within the visual database thereby reducing the area onwhich the computer must operate (i.e., the operator creates a vectorfrom the design frame moving object to each subsequent key frame movingobject following a close approximation to the center of the region ofinterest represented within the visual database of the key frame movingobject. This operator-assisted method restricts the required detectionoperations that must be performed by the computer in applying masks tothe corresponding regions of interest in the raw frames).

In the production phase—The composited key frame motion object databasedescribed above is presented along with all subsequent motion inclusiveof fully masked selected key frame motion objects. As above, all motioncomposites can be toggled on and off within the background orsequentially turned on and off in succession within the background tosimulate actual motion. In addition, all masked regions (regions ofinterest) can be presented in the absence of their corresponding motionobjects. In such cases the one-bit color masks are displayed as eithertranslucent or opaque arbitrary colors.

During the production process and under operator visual control, eachregion of interest within subsequent motion object frames, between twokey motion object frames undergoes a computer masking operation. Themasking operation involves a comparison of the masks in a precedingmotion object frame with the new or subsequent Detector File operationand underlying parameters (i.e., mask dimensions, gray scale values andmultiple weighting factors that lie within the vector of parameters inthe subsequent key frame motion object) in the successive frame. Thisprocess is aided by the windowing or pointing (using various pointinginstruments) and vector application within the visual database. If thevalues within an operator assisted detected region of the subsequentmotion object falls within the range of the corresponding region of thepreceding motion object, relative to the surrounding values and if thosevalues fall along a trajectory of values (vectors) anticipated by acomparison of the first key frame and the second key frame then thecomputer will determine a match and will attempt a best fit.

The uncompressed, high resolution images all reside at the server level,all subsequent masking operations on the regions of interest aredisplayed on the compressed composited frame in display memory or on atiled, compressed frame in display memory so that the operator candetermine correct tracking and matching of regions. A zoomed region ofinterest window showing the uncompressed region is displayed on thescreen to determine visually the region of interest best fit. Thishigh-resolution window is also capable of full motion viewing so thatthe operator can determine whether the masking operation is accurate inmotion.

In a first embodiment as shown in FIG. 1, a plurality of feature film ortelevision film frames 14 a-n representing a scene or cut in which thereis a single instance or perceptive of a background 16 (FIG. 3). In thescene 10 shown, several actors or motion elements 18′, 18″ and 18′″ aremoving within an outdoor stage and the camera is performing a pan left.FIG. 1 shows selected samples of the 120 total frames 14 making up the5-second pan.

In FIG. 2, an isolated background 16 processed scene from the pluralityof frames 14 a-n represented in FIG. 1 in which all motion elements 18are removed using various subtraction and differencing techniques. Theseparate frames that created the pan are combined into a visual databasein which unique and common pixels from each of the 120 frames 14composing the original pan are represented in the single compositebackground image 12 shown in FIG. 3. The single background image 12 isthen used to create a background mask overlay 20 representing designerselected color lookup tables in which dynamic pixel colors automaticallycompensate or adjust for moving shadows and other changes in luminance.For depth projects, any object in the background may be assigned anydepth. A variety of tools may be utilized to perform the assignment ofdepth information to any portion of the background including painttools, geometric icon based tools that allow setting a contour depth toan object, or text field inputs to allow for numeric inputs. Thecomposite background shown in FIG. 2 for example may also have a rampfunction assigned to allow for a nearer depth to be assigned to the leftportion of the scene and a linear increase in depth to the right of theimage to be automatically assigned. See also FIGS. 42-70.

In one illustrative embodiment of this invention, operator assisted andautomated operations are used to detect obvious anchor pointsrepresented by clear edge detected intersects and other contiguous edgesn each frame 14 making up the single composite image 12 and over laidmask 20. These anchor points are also represented within the compositeimage 12 and are used to aide in the correct assignment of the mark toeach frame 14 represented by the single composite image 12.

Anchor points and objects and/or areas that are clearly defined byclosed or nearly closed edges are designed as a single mask area andgiven a single lookup table. Within those clearly delineated regionspolygons are created of which anchor points are dominant points. Wherethere is no clear edge detected to create a perfectly closed region,polygons are generated using the edge of the applied mask.

The resulting polygon mesh includes the interior of anchor pointdominant regions plus all exterior areas between those regions.

Pattern parameters created by the distribution of luminance within eachpolygon are registered in a database for reference when correspondingpolygonal addresses of the overlying masks are applied to theappropriate addresses of the frames which were used to create thecomposite single image 12.

In FIG. 3, a representative sample of each motion object (M-Object) 18in the scene 10 receives a mask overlay that represents designerselected color lookup tables/depth assignments in which dynamic pixelcolors automatically compensate or adjust for moving shadows and otherchanges in luminance as the M-Object 18 moves within the scene 10. Therepresentative sample are each considered Key M-Objects 18 that are usedto define the underlying patterns, edges, grouped luminancecharacteristics, etc., within the masked M-Object 18. Thesecharacteristics are used to translate the design masks from one KeyM-Object 18 a to subsequent M-Objects 18 b along a defined vector ofparameters leading to Key M-Object 18 c, each Subsequent M-Objectbecoming the new Key M-Object in succession as masks are applied. Asshown, Key M-Object 18 a may be assigned a depth of 32 feet from thecamera capture point while Key M-Object 18 c may be assigned a depth of28 feet from the camera capture point. The various depths of the objectmay be “tweened” between the various depth points to allow for realisticthree-dimensional motion to occur within the cut without for examplerequiring wire frame models of all of the objects in the objects in aframe.

As with the background operations above, operator assisted and automatedoperations are used to detect obvious anchor points represented by clearedge detected intersects and other contiguous edges in each motionobject used to create a keyframe.

Anchor points and specific regions of interest within each motion objectthat are clearly defined by closed or nearly closed edges are designatedas a single mask area and given a single lookup table. Within thoseclearly delineated regions, polygons are created of which anchor pointsare dominant points. Where there is no clear edge detected to create aperfectly closed region, polygons are generated using the edge of theapplied mask.

The resulting polygon mesh includes the interior of the anchor pointdominant regions plus all exterior areas between those regions.

Pattern parameters created by the distribution of luminance valueswithin each polygon are registered in a database for reference whencorresponding polygonal addresses of the overlying masks are applied tothe appropriate addresses of the frames that were used to create thecomposite single frame 12.

The greater the polygon sampling the more detailed the assessment of theunderlying luminance values and the more precise the fit of theoverlying mask.

Subsequent or in-between motion key frame objects 18 are processedsequentially. The group of masks comprising the motion key frame objectremains in its correct address location in the subsequent frame 14 or inthe subsequent instance of the next motion object 18. The mask is shownas an opaque or transparent color. An operator indicates each mask insuccession with a mouse or other pointing device and along with itscorresponding location in the subsequent frame and/or instance of themotion object. The computer then uses the prior anchor point andcorresponding polygons representing both underlying luminance textureand mask edges to create a best fit to the subsequent instance of themotion object.

The next instance of the motion object 18 is operated upon in the samemanner until all motion objects 18 in a cut 10 and/or scene arecompleted between key motion objects.

In FIG. 4, all mask elements of the scene 10 are then rendered to createa fully colored and/or depth enhanced frame in which M-Object 18 masksare applied to each appropriate frame in the scene followed by thebackground mask 20, which is applied only where there is no pre-existingmask in a Boolean manner. Foreground elements are then applied to eachframe 14 according to a pre-programmed priority set. Aiding the accurateapplication of background masks 20 are vector points which are appliedby the designer to the visual database at the time of masking wherethere are well defined points of reference such as edges and/or distinctluminance points. These vectors create a matrix of reference pointsassuring accuracy of rendering masks to the separate frames that composeeach scene. The applied depths of the various objects determine theamount of horizontal translation applied when generating left and rightviewpoints as utilized in three-dimensional viewing as one skilled inthe art will appreciate. In one or more embodiments of the invention,the desired objects may be dynamically displayed while shifting by anoperator set and observe a realistic depth. In other embodiments of theinvention, the depth value of an object determines the horizontal shiftapplied as one skilled in the art will recognize and which is taught inat least U.S. Pat. No. 6,031,564, to Ma et al., the specification ofwhich is hereby incorporated herein by reference.

The operator employs several tools to apply masks to successive movieframes.

Display: A key frame that includes all motion objects for that frame isfully masked and loaded into the display buffer along with a pluralityof subsequent frames in thumbnail format; typically 2 seconds or 48frames.

FIGS. 5A and 5B show a series of sequential frames 14 a-n loaded intodisplay memory in which one frame 14 is fully masked with the background(key frame) and ready for mask propagation to the subsequent frames 14via automatic mask fitting methods.

All frames 14 along with associated masks and/or applied colortransforms/depth enhancements can also be displayed sequentially inreal-time (24 frames/sec) using a second (child) window to determine ifthe automatic masking operations are working correctly. In the case ofdepth projects, stereoscopic glasses or red/blue anaglyph glasses may beutilized to view both viewpoints corresponding to each eye. Any type ofdepth viewing technology may be utilized to view depth enhanced imagesincluding video displays that require no stereoscopic glasses yet whichutilizes more than two image pairs which may be created utilizingembodiments of the invention.

FIGS. 6A and 6B show the child window displaying an enlarged andscalable single image of the series of sequential images in displaymemory. The Child window enables the operator to manipulate masksinteractively on a single frame or in multiple frames during real timeor slowed motion.

Mask Modification: Masks can be copied to all or selected frames andautomatically modified in thumbnail view or in the preview window. Inthe preview window mask modification takes place on either individualframes in the display or on multiple frames during real-time motion.

Propagation of Masks to Multiple Sequential Frames in Display Memory:Key Frame masks of foreground motion objects are applied to all framesin the display buffer using various copy functions:

Copy all masks in one frame to all frames;

Copy all masks in one frame to selected frames;

Copy selected mask or masks in one frame to all frames;

Copy selected mask or masks in one frame to selected frames; and

Create masks generated in one frame with immediate copy at the sameaddresses in all other frames.

Refining now to FIGS. 7A and 7B, a single mask (flesh) is propagatedautomatically to all frames 14 in the display memory. The operator coulddesignate selective frames to apply the selected mask or indicate thatit is applied to all frames 14. The mask is a duplication of the initialmask in the first fully masked frame. Modifications of that mask occuronly after they have been propagated.

As shown in FIG. 8, all masks associated with the motion object arepropagated to all sequential frames in display memory. The images showthe displacement of the underlying image data relative to the maskinformation.

None of the propagation methods listed above actively fit the masks toobjects in the frames 14. They only apply the same mask shape andassociated color transform information from one frame, typically the keyframe to all other frames or selected frames.

Masks are adjusted to compensate for object motion in subsequent framesusing various tools based on luminance, pattern and edge characteristicsof the image.

Automatic Mask Fitting: Successive frames of a feature film or TVepisode exhibit movement of actors and other objects. These objects aredesigned in a single representative frame within the current embodimentsuch that operator selected features or regions have unique colortransformations identified by unique masks, which encompass the entirefeature. The purpose of the mask-fitting tool is to provide an automatedmeans for correct placement and reshaping of a each mask region ofinterest (ROI) in successive frames such that the mask accuratelyconforms to the correct spatial location and two dimensional geometry ofthe ROI as it displaces from the original position in the singlerepresentative frame. This method is intended to permit propagation of amask region from an original reference or design frame to successiveframes, and automatically enabling it to adjust shape and location toeach image displacement of the associated underlying image feature. Forcomputer-generated elements, the associated masks are digitally createdand can be assumed to be accurate throughout the scene and thus theoutlines and depths of the computer-generated areas may be ignored forautomatic mask fitting or reshaping operations. Elements that borderthese objects, may thus be more accurately reshaped since the outlinesof the computer-generated elements are taken as correct. Hence, even forcomputer-generated elements having the same underlying gray scale of acontiguous motion or background element, the shape of the mask at thejunction can be taken to be accurate even though there is no visualdifference at the junction. Hence, whenever automatic mask fitting ofmask takes shape with a border of a computer-generated element mask, thecomputer-generated element mask can be utilized to define the border ofthe operator-defined mask as per step 3710 of FIG. 37C. This savesprocessing time since automatic mask fitting in a scene with numerouscomputer-generated element masks can be minimized.

The method for automatically modifying both the location and correctlyfitting all masks in an image to compensate for movement of thecorresponding image data between frames involves the following:

Set Reference Frame Mask and Corresponding Image Data:

1. A reference frame (frame 1) is masked by an operator using a varietyof means such as paint and polygon tools so that all regions of interest(i.e., features) are tightly covered.

2. The minimum and maximum x,y coordinate values of each masked regionare calculated to create rectangular bounding boxes around each maskedregion encompassing all underlying image pixels of each masked region.

3. A subset of pixels are identified for each region of interest withinits bounding rectangle (i.e., every 10th pixel)

Copy Reference Frame Mask and Corresponding Image Data To All SubsequentFrames: The masks, bounding boxes and corresponding subset of pixellocations from the reference frame are copied over to all subsequentframes by the operator.

Approximate Offset of Regions Between Reference Frame and the NextSubsequent Frame:

1. Fast Fourier Transform (FFT) are calculated to approximate image datadisplacements between frame 1 and frame 2

2. Each mask in frame 2 with the accompanying bounding boxes are movedto compensate for the displacement of corresponding image data fromframe 1 using the FFT calculation.

3. The bounding box is augmented by an additional margin around theregion to accommodate other motion and shape morphing effects.

Fit Masks to the New Location:

1. Using the vector of offset determined by the FFT, a gradient decentof minimum errors is calculated in the image data underlying each maskby:

2. Creating a fit box around each pixel within the subset of thebounding box

3. Calculating a weighed index of all pixels within the fit box using abilinear interpolation method.

4. Determining offset and best fit to each subsequent frame use GradientDecent calculations to fit the mask to the desired region.

Mask fit initialization: An operator selects image features in a singleselected frame of a scene (the reference frame) and creates masks withcontain all color transforms (color lookup tables) for the underlyingimage data for each feature. The selected image features that areidentified by the operator have well-defined geometric extents which areidentified by scanning the features underlying each mask for minimum andmaximum x, y coordinate values, thereby defining a rectangular boundingbox around each mask.

The Fit Grid used for Fit Grid Interpolation: For optimization purposes,only a sparse subset of the relevant mask-extent region pixels withineach bounding box are fit with the method; this subset of pixels definesa regular grid in the image, as labeled by the light pixels of FIG. 9A.

The “small dark” pixels shown in FIG. 9B are used to calculate a weighedindex using bilinear interpolation. The grid spacing is currently set at10 pixels, so that essentially no more than 1 in 50 pixels are presentlyfit with a gradient descent search. This grid spacing could be a usercontrollable parameter.

Fast Fourier Transform (FFT) to Estimate Displacement Values: Masks withcorresponding rectangular bounding boxes and fit grids are copied tosubsequent frames. Forward and inverse FFTs are calculated between thereference frame the next subsequent frame to determine the x,ydisplacement values of image features corresponding to each mask andbounding box. This method generates a correlation surface, the largestvalue of which provides a “best fit” position for the correspondingfeature's location in the search image. Each mask and bounding box isthen adjusted within the second frame to the proper x,y locations.

Fit Value Calculation (Gradient Descent Search): The FFT provides adisplacement vector, which directs the search for ideal mask fittingusing the Gradient Descent Search method. Gradient descent searchrequires that the translation or offset be less than the radius of thebasin surrounding the minimum of the matching error surface. Asuccessful FFT correlation for each mask region and bounding box willcreate the minimum requirements.

Searching for a Best Fit on the Error Surface: An error surfacecalculation in the Gradient Descent Search method involves calculatingmean squared differences of pixels in the square fit box centered onreference image pixel (x0, y0), between the reference image frame andthe corresponding (offset) location (x,y) on the search image frame, asshown in FIGS. 10A, B, C and D.

Corresponding pixel values in two (reference and search) fit boxes aresubtracted, squared, summed/accumulated, and the square-root of theresultant sum finally divided by the number of pixels in the box(#pixels=height×width=height2) to generate the root mean square fitdifference (“Error”) value at the selected fit search location

Error(x0, y0; x,y)=

{Σi□Σj□(reference box(x0,y0)pixel[i,j]−searchbox(x,y)pixel[i,j])2}/(height2)

Fit Value Gradient: The displacement vector data derived from the FFTcreates a search fit location, and the error surface calculation beginsat that offset position, proceeding down (against) the gradient of theerror surface to a local minimum of the surface, which is assumed to bethe best fit This method finds best fit for each next frame pixel orgroups of pixels based on the previous frame, using normalized squareddifferences, for instance in a 10×10 box and finding a minimum down themean squared difference gradients. This technique is similar to a crosscorrelation but with a restricted sampling box for the calculation. Inthis way the corresponding fit pixel in the previous frame can bechecked for its mask index, and the resulting assignment is complete.

FIGS. 11A, B and C show a second search box derived from a descent downthe error surface gradient (evaluated separately), for which theevaluated error function is reduced, possibly minimized, with respect tothe original reference box (evident from visual comparison of the boxeswith the reference box in FIGS. 10A, B, C and D).

The error surface gradient is calculated as per definition of thegradient. Vertical and horizontal error deviations are evaluated at fourpositions near the search box center position, and combined to providean estimate of the error gradient for that position. The gradientcomponent evaluation is explained with the help of FIG. 12.

The gradient of a surface S at coordinate (x,y) is given by thedirectional derivatives of the surface:

gradient(x,y)=[dS(x,y)/dx,dS(x,y)/dy],

which for the discrete case of the digital image is provided by:

gradient(x,y)=

[(Error(x+dx,y)−Error(x−dx,y))/(2*dx),(Error(x,y+dy)−Error(x,y−dy))/(2*dy)]

where dx, dy are one-half the box-width or box-height, also defined asthe fit-box “box-radius”: box-width=box-height=2×box-radius+1

Note that with increasing box-radius, the fit-box dimensions increaseand consequently the size and detail of an image feature containedtherein increase as well; the calculated fit accuracy is thereforeimproved with a larger box and more data to work with, but thecomputation time per fit (error) calculation increases as the square ofthe radius increase. For any computer-generated element mask area pixelthat is found at a particular pixel x, y location, then that location istaken to be the edge of the overlying operated-defined mask and maskfitting continues at other pixel locations until all pixels of the maskare checked

Previous vs. Propagated Reference Images: The reference image utilizedfor mask fitting is usually an adjacent frame in a film image-framesequence. However, it is sometimes preferable to use an exquisitely fitmask as a reference image (e.g. a key frame mask, or the source framefrom which mask regions were propagated/copied). The present embodimentprovides a switch to disable “adjacent” reference frames, using thepropagated masks of the reference image if that frame is defined by arecent propagation event.

The process of mask fitting: In the present embodiment the operatorloads n frames into the display buffer. One frame includes the masksthat are to be propagated and fitted to all other frames. All or some ofthe mask(s) are then propagated to all frames in the display buffer.Since the mask-fitting algorithm references the preceding frame or thefirst frame in the series for fitting masks to the subsequent frame, thefirst frame masks and/or preceding masks must be tightly applied to theobjects and/or regions of interest. If this is not done, mask errorswill accumulate and mask fitting will break down. The operator displaysthe subsequent frame, adjusts the sampling radius of the fit andexecutes a command to calculate mask fitting for the entire frame. Theexecution command can be a keystroke or mouse-hotkey command.

As shown in FIG. 13, a propagated mask in the first sequential instancewhere there is little discrepancy between the underlying image data andthe mask data. The dress mask and hand mask can be clearly seen to beoff relative to the image data.

FIG. 14 shows that by using the automatic mask fitting routine, the maskdata adjusts to the image data by referencing the underlying image datain the preceding image.

In FIG. 15, the mask data in later images within the sequence showmarked discrepancy relative to the underlying image data. Eye makeup,lipstick, blush, hair, face, dress and hand image data are all displacedrelative to the mask data.

As shown in FIG. 16, the mask data is adjusted automatically based onthe underlying image data from the previous mask and underlying imagedata. In this FIG. 13, the mask data is shown with random colors to showthe regions that were adjusted automatically based on underlying patternand luminance data. The blush and eye makeup did not have edge data toreference and were auto-adjusted on the basis of luminance and grayscalepattern.

In FIG. 17, mask data from FIG. 16 is shown with appropriate colortransforms after whole frame automatic mask fitting. The mask data isadjusted to fit the underlying luminance pattern based on data from theprevious frame or from the initial key frame.

Mask Propagation With Bezier and Polygon Animation Using Edge Snap:Masks for motion objects can be animated using either Bezier curves orpolygons that enclose a region of interest. A plurality of frames areloaded into display memory and either Bezier points and curves orpolygon points are applied close to the region of interest where thepoints automatically snap to edges detected within the image data. Oncethe object in frame one has been enclosed by the polygon or Beziercurves the operator adjusts the polygon or Bezier in the last frame ofthe frames loaded in display memory. The operator then executes afitting routine, which snaps the polygons or Bezier points plus controlcurves to all intermediate frames, animating the mask over all frames indisplay memory. The polygon and Bezier algorithms include control pointsfor rotation, scaling and move-all to handle camera zooms, pans andcomplex camera moves.

In FIG. 18, polygons are used to outline a region of interest formasking in frame one. The square polygon points snap to the edges of theobject of interest. Using a Bezier curve the Bezier points snap to theobject of interest and the control points/curves shape to the edges.

As disclosed in FIG. 19, the entire polygon or Bezier curve is carriedto a selected last frame in the display memory where the operatoradjusts the polygon points or Bezier points and curves using the snapfunction, which automatically snaps the points and curves to the edgesof the object of interest.

As shown in FIG. 20, if there is a marked discrepancy between the pointsand curves in frames between the two frames where there was an operatorinteractive adjustment, the operator will further adjust a frame in themiddle of the plurality of frames where there is maximum error of fit.

As shown in FIG. 21, when it is determined that the polygons or Beziercurves are correctly animating between the two adjusted frames, theappropriate masks are applied to all frames. In these figures, thearbitrary mask color is seen filling the polygon or Bezier curves.

FIG. 22 shows the resulting masks from a polygon or Bezier animationwith automatic point and curve snap to edges. The brown masks are thecolor transforms and the green masks are the arbitrary color masks. Fordepth projects, areas that have been depth assigned may be of one colorwhile those areas that have yet to be depth assigned may be of anothercolor for example.

Colorization/Depth Enhancement of Backgrounds in feature films andtelevision episode: The process of applying mask information tosequential frames in a feature film or television episode is known, butis laborious for a number of reasons. In all cases, these processesinvolve the correction of mask information from frame to frame tocompensate for the movement of underlying image data. The correction ofmask information not only includes the re-masking of actors and othermoving objects within a scene or cut but also correction of thebackground and foreground information that the moving objects occlude orexpose during their movement. This has been particularly difficult incamera pans where the camera follows the action to the left, right, upor down in the scene cut. In such cases the operator must not onlycorrect for movement of the motion object, the operator must alsocorrect for occlusion and exposure of the background information pluscorrect for the exposure of new background information as the cameramoves to new parts of the background and foreground. Typically theseinstances greatly increase the time and difficulty factor of colorizinga scene cut due to the extreme amount of manual labor involved.Embodiments of the invention include a method and process forautomatically colorizing/depth enhancing a plurality of frames in scenescuts that include complex camera movements as well as scene cuts wherethere is camera weave or drifting cameras movement that follows erraticaction of the motion objects.

Camera Pans: For a pan camera sequence, the background associated withnon-moving objects in a scene form a large part of the sequence. Inorder to colorize/depth enhance a large amount of background objects fora pan sequence, a mosaic that includes the background objects for anentire pan sequence with moving objects removed is created. This task isaccomplished with a pan background stitcher tool. Once a backgroundmosaic of the pan sequence is generated, it can be colorized/depthenhanced once and applied to the individual frames automatically,without having to manually colorize/depth assign the background objectsin each frame of the sequence.

The pan background stitcher tool generates a background image of a pansequence using two general operations. First, the movement of the camerais estimated by calculating the transformation needed to align eachframe in the sequence with the previous frame. Since moving objects forma large portion of cinematic sequences, techniques are used thatminimize the effects of moving objects on the frame registration.Second, the frames are blended into a final background mosaic byinteractively selecting two pass blending regions that effectivelyremove moving objects from the final mosaic.

Background composite output data includes a greyscale/(or possibly colorfor depth projects) image file of standard digital format such as TIFFimage file (bkg.*.tif) comprised of a background image of the entire panshot, with the desired moving objects removed, ready for colordesign/depth assignments using the masking operations already described,and an associated background text data file needed for background maskextraction after associated background mask/colorization/depth datacomponents (bkg.*.msk, bkg.*lut, . . . ) have been established. Thebackground text data file provides filename, frame position within themosaic, and other frame-dimensioning information for each constituent(input) frame associated with the background, with the following perline (per frame) content: Frame-filename, frame-x-position,frame-y-position, frame-width, frame-height, frame-left-margin-x-max,frame-right-margin-x-min. Each of the data fields are integers exceptfor the first (frame-filename), which is a string.

Generating Transforms: In order to generate a background image for a pancamera sequence, the motion of the camera first is calculated. Themotion of the camera is determined by examining the transformationneeded to bring one frame into alignment with the previous frame. Bycalculating the movement for each pair of consecutive frames in thesequence, a map of transformations giving each frame's relative positionin the sequence can be generated.

Translation Between Image Pairs: Most image registration techniques usesome form of intensity correlation. Unfortunately, methods based onpixel intensities will be biased by any moving objects in the scene,making it difficult to estimate the movement due to camera motion.Feature based methods have also been used for image registration. Thesemethods are limited by the fact that most features occur on theboundaries of moving objects, also giving inaccurate results for purecamera movement. Manually selecting feature points for a large number offrames is also too costly.

The registration method used in the pan stitcher uses properties of theFourier transform in order to avoid bias towards moving objects in thescene. Automatic registration of frame pairs is calculated and used forthe final background image assembly.

Fourier Transform of an Image Pair: The first step in the imageregistration process consists of taking the Fourier transform of eachimage. The camera motion can be estimated as a translation. The secondimage is translated by a certain amount given by:

I ₂(x,y)=I ₁(x−x ₀ ,y−y ₀).  (1)

Taking the Fourier transform of each image in the pair yields thefollowing relationship:

F ₂(α,β)=e ^(−j·2π·(αx) ⁰ ^(−βy) ⁰ ⁾ ·F ₁(α,β).  (2)

Phase Shift Calculation: The next step involves calculating the phaseshift between the images. Doing this results in an expression for thephase shift in terms of the Fourier transform of the first and secondimage:

$\begin{matrix}{^{{{- j} \cdot 2}{\pi \cdot {({{\alpha \; x_{0}} - {\beta \; y_{0}}})}}} = {\frac{F_{1}^{*} \cdot F_{2}}{{F_{1}^{*} \cdot F_{2}}}.}} & (3)\end{matrix}$

Inverse Fourier Transform

By taking the inverse Fourier transform of the phase shift calculationgiven in (3) results in delta function whose peak is located at thetranslation of the second image.

$\begin{matrix}{{\delta ( {{x - x_{0}},{y - y_{0}}} )} = {{F^{- 1}\lbrack ^{{{- j} \cdot 2}{\pi \cdot {({{\alpha \; x_{0}} - {\beta \; y_{0}}})}}} \rbrack} = {F^{- 1}\lbrack \frac{F_{1}^{*} \cdot F_{2}}{{F_{1}^{*} \cdot F_{2}}} \rbrack}}} & (4)\end{matrix}$

Peak Location: The two-dimensional surface that results from (4) willhave a maximum peak at the translation point from the first image to thesecond image. By searching for the largest value in the surface, it issimple to find the transform that represents the camera movement in thescene. Although there will be spikes present due to moving objects, thedominant motion of the camera should represent the largest peak value.This calculation is performed for every consecutive pair of frames inthe entire pan sequence.

Dealing with Image Noise: Unfortunately, spurious results can occur dueto image noise which can drastically change the results of the transformcalculation. The pan background stitcher deals with these outliers usingtwo methods that detect and correct erroneous cases: closest peakmatching and interpolated positions. If these corrections fail for aparticular image pair, the stitching application has an option tomanually correct the position of any pair of frames in the sequence.

Closest Matching Peak: After the transform is calculated for an imagepair, the percent difference between this transform and the previoustransform is determined. If the difference is higher than apredetermined threshold than a search for neighboring peaks is done. Ifa peak is found that is a closer match and below the differencethreshold, then this value is used instead of the highest peak value.

This assumes that for a pan camera shot, the motion with be relativelysteady, and the differences between motions for each frame pair will besmall. This corrects for the case where image noise may cause a peakthat is slightly higher that the true peak corresponding to the cameratransformation.

Interpolating Positions: If the closest matching peak calculation failsto yield a reasonable result given by the percent difference threshold,then the position is estimated based on the result from the previousimage pair. Again, this gives generally good results for a steady pansequence since the difference between consecutive camera movementsshould be roughly the same. The peak correlation values and interpolatedresults are shown in the stitching application, so manual correction canbe done if needed.

Generating the Background: Once the relative camera movement for eachconsecutive frame pair has been calculated, the frames can be compositedinto a mosaic which represents the entire background for the sequence.Since the moving objects in the scene need to be removed, differentimage blending options are used to effectively remove the dominantmoving objects in the sequence.

Assembling the Background Mosaic: First a background image buffer isgenerated which is large enough to span the entire sequence. Thebackground can be blended together in a single pass, or if movingobjects need to be removed, a two-pass blend is used, which is detailedbelow. The position and width of the blend can be edited in thestitching application and can be set globally set or individually setfor each frame pair. Each blend is accumulated into the final mosaic andthen written out as a single image file.

Two Pass Blending: The objective in two-pass blending is to eliminatemoving objects from the final blended mosaic. This can be done by firstblending the frames so the moving object is completely removed from theleft side of the background mosaic. An example is shown in FIG. 23,where the character can is removed from the scene, but can still be seenin the right side of the background mosaic. FIG. 23. In the first passblend shown in FIG. 23, the moving character is shown on the stairs tothe right

A second background mosaic is then generated, where the blend positionand width is used so that the moving object is removed from the rightside of the final background mosaic. An example of this is shown in FIG.24, where the character can is removed from the scene, but can still beseen the left side of the background mosaic. In the second pass blend asshown in FIG. 24, the moving character is shown on the left.

Finally, the two-passes are blended together to generate the finalblended background mosaic with the moving object removed from the scene.The final background corresponding to FIGS. 23 and 24 is shown in FIG.25. As shown in FIG. 25, the final blended background with movingcharacter is removed.

In order to facilitate effective removal of moving objects, which canoccupy different areas of the frame during a pan sequence, the stitcherapplication has on option to interactively set the blending width andposition for each pass and each frame individually or globally. Anexample screen shot from the blend editing tool, showing the first andsecond pass blend positions and widths, can be seen in FIG. 26, which isa screen shot of the blend-editing tool.

Background Text Data Save: An output text data file containing parametervalues relevant for background mask extraction as generated from theinitialization phase described above. As mentioned above, each text datarecord includes: Frame-filename frame-x-position frame-y-positionframe-width frame-height frame-left-margin-x-maxframe-right-margin-x-min.

The output text data filename is composed from the first composite inputframe rootname by prepending the “bkg.” prefix and appending the “.txt”extension.

Example

Representative lines output text data file called “bkgA.00233.txt” thatmay include data from 300 or more frames making up the blended image:

4.00233.tif 0 0 1436 1080 0 1435

4.00234.tif 7 0 1436 1080 0 1435

4.00235.tif 20 0 1436 1080 0 1435

4.00236.tif 37 0 1436 1080 0 1435

4.00237.tif 58 0 1436 1080 0 1435

Image offset information used to create the composite representation ofthe series of frames is contained within a text file associated with thecomposite image and used to apply the single composite mask to all theframes used to create the composite image.

In FIG. 27, sequential frames representing a camera pan are loaded intomemory. The motion object (butler moving left to the door) has beenmasked with a series of color transform information leaving thebackground black and white with no masks or color transform informationapplied. Alternatively for depth projects, the motion object may beassigned a depth and/or depth shape. See FIGS. 42-70.

In FIG. 28, six representative sequential frames of the pan above aredisplayed for clarity.

FIG. 29 show the composite or montage image of the entire camera panthat was built using phase correlation techniques. The motion object(butler) included as a transparency for reference by keeping the firstand last frame and averaging the phase correlation in two directions.The single montage representation of the pan is color designed using thesame color transform masking techniques as used for the foregroundobject.

FIG. 30 shows that the sequence of frames in the camera pan after thebackground mask color transforms the montage has been applied to eachframe used to create the montage. The mask is applied where there is nopre-existing mask thus retaining the motion object mask and colortransform information while applying the background information withappropriate offsets. Alternatively for depth projects, the left andright eye views of each frame may be shown as pairs, or in a separatewindow for each eye for example. Furthermore, the images may bedisplayed on a three-dimensional viewing display as well.

In FIG. 31, a selected sequence of frames in the pan for clarity afterthe color background/depth enhanced background masks have beenautomatically applied to the frames where there is no pre-existingmasks.

Static and drifting camera shots: Objects which are not moving andchanging in a film scene cut can be considered “background” objects, asopposed to moving “foreground” objects. If a camera is not movingthroughout a sequence of frames, associated background objects appear tobe static for the sequence duration, and can be masked and colorizedonly once for all associated frames. This is the “static camera” (or“static background”) case, as opposed to the moving (e.g. panning)camera case, which requires stitching tool described above to generate abackground composite.

Cuts or frame sequences involving little or no camera motion provide thesimplest case for generating frame-image background “composites” usefulfor cut background colorization. However, since even a “static” cameraexperiences slight vibrations for a variety of reasons, the staticbackground composition tool cannot assume perfect pixel alignment fromframe-to-frame, requiring an assessment of inter-frame shifts, accurateto 1 pixel, in order to optimally associated pixels between frames priorto adding their data contribution into the composite (an averagedvalue). The Static Background Composite tool provides this capability,generating all the data necessary to later colorize and extractbackground colorization information for each of the associated frames.

Moving foreground objects such as actors, etc., are masked leaving thebackground and stationary foreground objects unmasked. Wherever themasked moving object exposes the background or foreground the instanceof background and foreground previously occluded is copied into thesingle image with priority and proper offsets to compensate formovement. The offset information is included in a text file associatedwith the single representation of the background so that the resultingmask information can be applied to each frame in the scene cut withproper mask offsets.

Background composite output data uses a greyscale TIFF image file(bkg.*.tif) that includes averaged input background pixel values lendingitself to colorization/depth enhancement, and an associated backgroundtext data file required for background mask extraction after associatedbackground mask/colorization data/depth enhancement components(bkg.*.msk, bkg.*.lut, . . . ) have been established. Background textdata provides filename, mask-offset, and other frame-dimensioninginformation for each constituent (input) frame associated with thecomposite, with the following per line (per frame) format:Frame-filename frame-x-offset frame-y-offset frame-width frame-heightframe-left-margin-x-max frame-right-margin-x-min. Each of these datafields are integers except for the first (frame-filename), which is astring.

Initialization: Initialization of the static background compositionprocess involves initializing and acquiring the data necessary to createthe composited background image-buffer and -data. This requires a loopover all constituent input image frames. Before any composite datainitialization can occur, the composite input frames must be identified,loaded, and have all foreground objects identified/colorized (i.e.tagged with mask labels, for exclusion from composite). These steps arenot part of the static background composition procedure, but occur priorto invoking the composite tool after browsing a database or directorytree, selecting and loading relevant input frames, painting/depthassigning the foreground objects.

Get Frame Shift: Adjacent frames' image background data in a staticcamera cut may exhibit small mutual vertical and horizontal offsets.Taking the first frame in the sequence as a baseline, all successiveframes' background images are compared to the first frames′, fittingline-wise and column-wise, to generate two histograms of “measured”horizontal and vertical offsets, from all measurable image-lines and-columns. The modes of these histograms provide the most frequent (andlikely) assessed frame offsets, identified and stored in arraysDVx[iframe], DVy[iframe] per frame [iframe]. These offset arrays aregenerated in a loop over all input frames.

Get Maximum Frame Shift: While looping over input frames duringinitialization to generate the DVx[ ], DVy[ ] offset array data, theabsolute maximum DVxMax, DVyMax values are found from the DVx[ ], DVy[ ]values. These are required when appropriately dimensioning the resultantbackground composite image to accommodate all composited frames' pixelswithout clipping.

Get Frame Margin: While looping over input frames during initialization,an additional procedure is invoked to find the right edge of the leftimage margin as well as the left edge of the right image margin. Aspixels in the margins have zero or near-zero values, the column indexesto these edges are found by evaluating average image-column pixel valuesand their variations. The edge column-indexes are stored in arrayslMarg[iframe] and rMarg[iframe] per frame [iframe], respectively.

Extend Frame Shifts with Maximum: The Frame Shifts evaluated in theGetFrameShift( ) procedure described are relative to the “baseline”first frame of a composited frame sequence, whereas the sought frameshift values are shifts/offsets relative to the resultant backgroundcomposite frame. The background composite frame's dimensions equal thefirst composite frame's dimensions extended by vertical and horizontalmargins on all sides with widths DVxMax, DVyMax pixels, respectively.Frame offsets must therefore include margin widths relative to theresultant background frame, and therefore need to be added, per iframe,to the calculated offset from the first frame:

DVx[iframe]=DVx[iframe]+DVxMax

DVy[iframe]=DVy[iframe]+DVyMax

Initialize Composite Image: An image-buffer class object instance iscreated for the resultant background composite. The resultant backgroundcomposite has the dimensions of the first input frame increased by2*DVxMax (horizontally) and 2*DVyMax (vertically) pixels, respectively.The first input frame background image pixels (mask-less, non-foregroundpixels) are copied into the background image buffer with the appropriateframe offset. Associated pixel composite count buffer values areinitialized to one (1) for pixels receiving an initialization, zero (0)otherwise. See FIG. 38A for the flow of the processing for extracting abackground, which occurs by generating a frame mask for all frames of ascene for example. FIG. 38B illustrations the determination of theamount of Frame shift and margin that is induced for example by a camerapan. The composite image is saved after determining and overlaying theshifted images from each of the desired frames for example.

FIG. 39A shows the edgeDetection and determination of points to snap to(1.1 and 1.2 respectively), which are detailed in FIGS. 39B and 39Crespectively and which enable one skilled in the art to implement aimage edge detection routine via Average Filter, Gradient Filter, FillGradient Image and a comparison with a Threshold. In addition, theGetSnapPoint routine of FIG. 39C shows the determination of a NewPointbased on the BestSnapPoint as determined by the RangeImage less thanMinDistance as shown.

FIGS. 40A-C shows how a bimodal threshold tool is implemented in one ormore embodiments of the invention. Creation of an image of light anddark cursor shape is implemented with the MakeLightShape routine whereinthe light/dark values for the shape are applied with the respectiveroutine as shown at the end of FIG. 40A. These routines are shown inFIGS. 40C and 40B respectively. FIGS. 41A-B show the calculation ofFitValues and gradients for use in one or more of the above routines.

Composite Frame Loop: Input frames are composited (added) sequentiallyinto the resultant background via a loop over the frames. Input framebackground pixels are added into the background image buffer with therelevant offset (DVx[iframe], DVy[iframe]) for each frame, andassociated pixel composite count values are incremented by one (1) forpixels receiving a composite addition (a separate composite countarray/buffer is provided for this). Only background pixels, thosewithout an associated input mask index, are composited (added) into theresultant background; pixels with nonzero (labeled) mask values aretreated as foreground pixels and are therefore not subject tocomposition into the background; thus they are ignored. A status bar inthe Gill is incremented per pass through the input frame loop.

Composite Finish: The final step in generating the output compositeimage buffer requires evaluating pixel averages which constitute thecomposite image. Upon completion of the composite frame loop, abackground image pixel value represents the sum of all contributingaligned input frame pixels. Since resultant output pixels must be anaverage of these, division by a count of contributing input pixels isrequired. The count per pixel is provided by the associated pixelcomposite count buffer, as mentioned. All pixels with nonzero compositecounts are averaged; other pixels remain zero.

Composite Image Save: A TIFF format output gray-scale image with 16 bitsper pixel is generated from composite-averaged background image buffer.The output filename is composed from the first composite input framefilename by pre-pending the “bkg.” prefix (and appending the usual“.tif” image extension if required), and writing to the associatedbackground folder at path “ . . . /Bckgrnd Frm”, if available, otherwiseto the default path (same as input frames′).

Background Text Data Save: An output text data file containing parametervalues relevant for background mask extraction as generated from theinitialization phase described in (40A-C). As mentioned in theintroduction (see FIG. 39A), each text data record consists of:Frame-filename frame-x-offset frame-y-offset frame-width frame-heightframe-left-margin-x-max frame-right-margin-x-min.

The output text data filename is composed from the first composite inputframe rootname by prepending the “bkg.” prefix and appending the “.txt”extension, and writing to the associated background folder at path “ . .. /Bckgrnd Frm”, if available, otherwise to the default path (same asinput frames′).

Example

A complete output text data file called “bkg.02.00.06.02.txt”:

C:\NewYolder\Static_Backgrounding_Test\02.00.06.02.tif 1 4 1920 1080 01919

C:\New_Folder\Static_Backgrounding_Test\02.00.06.03.tif 1 4 1920 1080 01919

C:\New_Folder\Static_Backgrounding_Test\02.00.06.04.tif 1 3 1920 1080 01919

C:\New_Folder\Static_Backgrounding_Test\02.00.06.05.tif 2 3 1920 1080 01919

C:\New_Folder\Static_Backgrounding_Test\02.00.06.06.tif 1 3 1920 1080 01919

Data Cleanup: Releases memory allocated to data objects used by thestatic background composite procedure. These include the backgroundcomposite GUI dialog object and its member arrays DVx[ ], DVy[ ], lMarg[], rMarg[ ], and the background composite image buffer object, whosecontents have previously been saved to disk and are no longer needed.

Colorization/Depth Assignment of the Composite Background

Once the background is extracted as described above the single frame canbe masked by an operator with.

The offset data for the background composite is transferred to the maskdata overlaying the background such that the mask for each successiveframe used to create the composite is placed appropriately.

The background mask data is applied to each successive frame whereverthere are no pre-existing masks (e.g. the foreground actors).

FIG. 32 shows a sequence of frames in which all moving objects (actors)are masked with separate color transforms/depth enhancements.

FIG. 33 shows a sequence of selected frames for clarity prior tobackground mask information. All motion elements have been fully maskedusing the automatic mask-fitting algorithm.

FIG. 34 shows the stationary background and foreground information minusthe previously masked moving objects. In this case, the singlerepresentation of the complete background has been masked with colortransforms in a manner similar to the motion objects. Note that outlinesof removed foreground objects appear truncated and unrecognizable due totheir motion across the input frame sequence interval, i.e., the blackobjects in the frame represent areas in which the motion objects (actorsin this case) never expose the background and foreground, i.e., missingbackground image data 3401. The black objects are ignored forcolorization-only projects during the masking operation because theresulting background mask is later applied to all frames used to createthe single representation of the background only where there is nopre-existing mask. For depth related projects, the black objects wheremissing background image data 3401 exists, may artistically orrealistically rendered, for example to fill in information to beutilized in the conversion of two-dimensional images intothree-dimensional images. Since these areas are areas where pixels maynot be borrowed from other frames since they are never exposed in ascene, drawing them or otherwise creating believable images there,allows for all background information to be present and used forartifact free two-dimensional to three-dimensional conversion. Forexample, in order to create artifact-free three-dimensional image pairsfrom a two-dimensional image having areas that are never exposed in ascene, backgrounds having all or enough required information for thebackground areas that are always occluded may be generated. The missingbackground image data 3401 may be painted, drawn, created,computer-generated or otherwise obtained from a studio for example, sothat there is enough information in a background, including the blackareas to translate foreground objects horizontally and borrow generatedbackground data for the translated edges for occluded areas. Thisenables the generation of artifact free three-dimensional image pairssince translation of foreground objects horizontally, which may exposeareas that are always occluded in a scene, results in the use of thenewly created background data instead of stretching objects or morphingpixels which creates artifacts that are human detectable errors. Hence,obtaining backgrounds with occluded areas filled in, either partiallywith enough horizontal realistic image data or fully with all occludedareas rendered into a realistic enough looking area, i.e., drawn andcolorized and/or depth assigned, thus results in artifact free edges fordepth enhanced frames. See also FIGS. 70 and 71-76 and the associateddescription respectively. Generation of missing background data may alsobe utilized to create artifact free edges along computer-generatedelements as well.

FIG. 35 shows the sequential frames in the static camera scene cut afterthe background mask information has been applied to each frame withappropriate offsets and where there is no pre-existing mask information.

FIG. 36 shows a representative sample of frames from the static camerascene cut after the background information has been applied withappropriate offsets and where there is no pre-existing mask information.

Colorization Rendering: After color processing is completed for eachscene, subsequent or sequential color motion masks and related lookuptables are combined within 24-bit or 48-bit RGB color space and renderedas TIF or TGA files. These uncompressed, high-resolution images are thenrendered to various media such as HDTV, 35 mm negative film (via digitalfilm scanner), or a variety of other standard and non standard video andfilm formats for viewing and exhibit.

Process Flow:

Digitization, Stabilization and Noise Reduction:

1. 35 mm film is digitized to 1920×1080×10 in any one of several digitalformats.

2. Each frame undergoes standard stabilization techniques to minimizenatural weaving motion inherent in film as it traverses camera sprocketsas well as any appropriate digital telecine technology employed.Frame-differencing techniques are also employed to further stabilizeimage flow.

3. Each frame then undergoes noise reduction to minimize random filmgrain and electronic noise that may have entered into the captureprocess.

Pre-Production Movie Dissection into Camera Elements and Visual DatabaseCreation:

1. Each scene of the movie is broken down into background and foregroundelements as well as movement objects using various subtraction, phasecorrelation and focal length estimation algorithms. Background andforeground elements may include computer-generated elements or elementsthat exist in the original movie footage for example.

2. Backgrounds and foreground elements m pans are combined into a singleframe using uncompensated (lens) stitching routines.

3. Foregrounds are defined as any object and/or region that move in thesame direction as the background but may represent a faster vectorbecause of its proximity to the camera lens. In this method pans arereduced to a single representative image, which contains all of thebackground and foreground information taken from a plurality of frames.

4. Zooms are sometimes handled as a tiled database in which a matrix isapplied to key frames where vector points of reference correspond tofeature points in the image and correspond to feature points on theapplied mask on the composited mask encompassing any distortion.

5. A database is created from the frames making up the singlerepresentative or composited frame (i.e., each common and novel pixelduring a pan is assigned to the plurality of frames from which they werederived or which they have in common).

6. In this manner, a mask overlay representing an underlying lookuptable will be correctly assigned to the respective novel and commonpixel representations of backgrounds and foregrounds in correspondingframes.

Pre-Production Design Background Design:

1. Each entire background is colorized/depth assigned as a single framein which all motion objects are removed. Background masking isaccomplished using a routine that employs standard paint, fill, digitalairbrushing, transparency, texture mapping, and similar tools. Colorselection is accomplished using a 24-bit color lookup tableautomatically adjusted to match the density of the underlying gray scaleand luminance. Depth assignment is accomplished via assigning depths,assigning geometric shapes, entry of numeric values with respect toobjects, or in any other manner in the single composite frame. In thisway creatively selected colors/depths are applied that are appropriatefor mapping to the range of gray scale/depth underlying each mask. Thestandard color wheel used to select color ranges detects the underlyinggrayscale dynamic range and determines the corresponding color rangefrom which the designer may choose (i.e., only from those colorsaturations that will match the grayscale luminance underlying themask.)

2. Each lookup table allows for a multiplicity of colors applied to therange of gray scale values underlying the mask. The assigned colors willautomatically adjust according to luminance and/or according topre-selected color vectors compensating for changes in the underlyinggray scale density and luminance.

Pre-Production Design Motion Element Design:

1. Design motion object frames are created which include the entirescene background as well as a single representative moment of movementwithin the scene in which all characters and elements within the sceneare present. These moving non-background elements are called DesignFrame Objects (DFO).

2. Each DFO is broken down into design regions of interest (regions ofinterest) with special attention focused on contrasting elements withinthe DFOs that can be readily be isolated using various gray scale andluminance analyses such as pattern recognition and or edge detectionroutines. As existing color movies may be utilized for depthenhancement, regions of interest may be picked with color taken intoaccount.

3. The underlying gray scale- and luminance distribution of each maskedregion is displayed graphically as well as other gray scale analysesincluding pattern analysis together with a graphical representation ofthe region's shape with area, perimeter and various weightingparameters.

4. Color selection is determined for each region of interest comprisingeach object based on appropriate research into the film genre, period,creative intention, etc. and using a 24 bit color lookup tableautomatically adjusted to match the density of the underlying gray scaleand luminance suitable and creatively selected colors are applied. Thestandard color wheel detects the underlying grayscale range andrestricts the designer to choose only from those color saturations thatwill match the grayscale luminance underlying the mask. Depthassignments may be made or adjusted for depth projects until realisticdepth is obtained for example.

5. This process continues until a reference design mask is created forall objects that move in the scene.

Pre-Production Design Key Frame Objects Assistant Designer:

1. Once all color selection/depth assignment is generally completed fora particular scene the design motion object frame is then used as areference to create the larger number of key frame objects within thescene.

2. Key Frame Objects (all moving elements within the scene such aspeople, cars, etc that do not include background elements) are selectedfor masking.

3. The determining factor for each successive key frame object is theamount of new information between one key frame and the next key frameobject.

Method of Colorizing/Depth Enhancing Motion Elements in SuccessiveFrames:

1. The Production Colorist (operator) loads a plurality of frames intothe display buffer.

2. One of the frames in the display buffer will include a key frame fromwhich the operator obtains all masking information. The operator makesno creative or color/depth decisions since all color transforminformation is encoded within the key frame masks.

3. The operator can toggle from the colorized or applied lookup tablesto translucent masks differentiated by arbitrary but highly contrastingcolors.

4. The operator can view the motion of all frames in the display bufferobserving the motion that occurs in successive frames or they can stepthrough the motion from one key frame to the next.

5. The operator propagates (copies) the key frame mask information toall frames in the display buffer.

6. The operator then executes the mask fitting routine on each framesuccessively. FIG. 37A shows the mask fitting generally processing flowchart that is broken into subsequent detailed flow charts 37B and 37C.The program makes a best fit based on the grayscale/luminance, edgeparameters and pattern recognition based on the gray scale and luminancepattern of the key frame or the previous frame in the display. Forcomputer-generated elements, the mask fitting routines are skipped sincethe masks or alphas define digitally created (and hencenon-operator-defined) edges that accurately define thecomputer-generated element boundaries. Mask fitting operations take intoaccount the computer-generated element masks or alphas and stop whenhitting the edge of a computer-generated element mask since theseboundaries are accepted as accurate irrespective of grey-scale as perstep 3710 of FIG. 37C. This enhances the accuracy of mask edges andreshapes when colors of a computer-generated element andoperator-defined mask are of the same base luminance for example. Asshown in FIG. 37A, the Mask Fit initializes the region and fit gridparameters, then calls the Calculate fit grid routine and then theInterpolate mask on fit grid routine, which execute on any computer asdescribed herein, wherein the routines are specifically configured tocalculate fit grids as specified in FIGS. 37B and 37C. The flow ofprocessing of FIG. 37B from the Initialize region routine, to theinitialization of image line and image column and reference image flowsinto the CalculateFitValue routine which calls the fit gradient routinewhich in turn calculates xx, and yy as the difference between the xfit,yfit and gradients for x and y. If the FitValue is greater than the fit,for x, y and xx and yy, then the xfit and yfit values are stored in theFitGrid. Otherwise, processing continues back at the fit gradientroutine with new values for xfit and yfit. When the processing for thesize of the Grid is complete for x and y, then the mask is interpolatedas per FIG. 37C. After initialization, the indices i and j for theFitGridCell are determined and a bilinear interpolation is performed atthe fitGridA-D locations wherein the Mask is fit up to any border foundfor any CG element at 3710 (i.e., for a known alpha border or borderwith depth values for example that define a digitally rendered elementthat is taken as a certified correct mask border). The mask fittinginterpolation is continued up to the size of the mask defined by xendand yend.

7. In the event that movement creates large deviations in regions fromone frame to the next the operator can select individual regions tomask-fit. The displaced region is moved to the approximate location ofthe region of interest where the program attempts to create a best fit.This routine continues for each region of interest in succession untilall masked regions have been applied to motion objects in all sequentialframes in the display memory.

a. The operator clicks on a single mask in each successive frame on thecorresponding area where it belongs in frame 2. The computer makes abest fit based on the grayscale/luminance, edge parameters, gray scalepattern and other analysis.

b. This routine continues for each region in succession until allregions of interest have been repositioned in frame two.

c. The operator then indicates completion with a mouse click and masksin frame two are compared with gray scale parameters in frame three.

d. This operation continues until all motion in all frames between twoor more key frames is completely masked.

8. Where there is an occlusion, a modified best-fit parameter is used.Once the occlusion is passed, the operator uses the pre-occlusion frameas a reference for the post occlusion frames.

9. After all motion is completed, the background/set mask is applied toeach frame in succession. Application is: apply mask where no maskexists.

10. Masks for motion objects can also be animated using either Beziercurves or polygons that enclose a region of interest.

a. A plurality of frames are loaded into display memory and eitherBezier points and curves of polygon points are applied close to theregion of interest where the points automatically snap to edges detectedwithin the image data.

b. Once the object in frame one has been enclosed by the polygon orBezier curves the operator adjusts the polygon or Bezier in the lastframe of the frames loaded in display memory.

c. The operator then executes a fitting routine, which snaps thepolygons or Bezier points plus control curves to all intermediateframes, animating the mask over all frames in display memory.

d. The polygon and Bezier algorithms include control points forrotation, scaling and move-all to handle zooms, pans and complex cameramoves where necessary.

FIG. 42 shows two image frames that are separated in time by severalframes, of a person levitating a crystal ball wherein the variousobjects in the image frames are to be converted from two-dimensionalobjects to three-dimensional objects. As shown the crystal ball moveswith respect to the first frame (shown on top) by the time that thesecond frame (shown on the bottom) occurs. As the frames are associatedwith one another, although separated in time, much of the maskinginformation can be utilized for both frames, as reshaped usingembodiments of the invention previously described above. For example,using the mask reshaping techniques described above for colorization,i.e., using the underlying grey-scale for tracking and reshaping masks,much of the labor involved with converting a two-dimensional movie to athree-dimensional movie is eliminated. This is due to the fact that oncekey frames have color or depth information applied to them, the maskinformation can be propagated automatically throughout a sequence offrames which eliminates the need to adjust wire frame models forexample. Although there are only two images shown for brevity, theseimages are separated by several other images in time as the crystal ballslowly moves to the right in the sequence of images.

FIG. 43 shows the masking of the first object in the first image framethat is to be converted from a two-dimensional image to athree-dimensional image. In this figure, the first object masked is thecrystal ball. There is no requirement to mask objects in any order. Inthis case a simple free form drawing tool is utilized to apply asomewhat round mask to the crystal ball. Alternatively, a circle maskmay be dropped on the image and resized and translated to the correctposition to correspond to the round crystal ball. However, since mostobjects masked are not simple geometric shapes, the alternative approachis shown herein. The grey-scale values of the masked object are thusutilized to reshape the mask in subsequent frames.

FIG. 44 shows the masking of the second object in the first image frame.In this figure, the hair and face of the person behind the crystal ballare masked as the second object using a free form drawing tool. Edgedetection or grey-scale thresholds can be utilized to accurately set theedges of the masks as has been previously described above with respectto colorization. There is no requirement that an object be a singleobject, i.e., the hair and face of a person can be masked as a singleitem, or not and depth can thus be assigned to both or individually asdesired.

FIG. 45 shows the two masks in color in the first image frame allowingfor the portions associated with the masks to be viewed. This figureshows the masks as colored transparent masks so that the masks can beadjusted if desired.

FIG. 46 shows the masking of the third object in the first image frame.In this figure the hand is chosen as the third object. A free form toolis utilized to define the shape of the mask.

FIG. 47 shows the three masks in color in the first image frame allowingfor the portions associated with the masks to be viewed. Again, themasks can be adjusted if desired based on the transparent masks.

FIG. 48 shows the masking of the fourth object in the first image frame.As shown the person's jacket form the fourth object.

FIG. 49 shows the masking of the fifth object in the first image frame.As shown the person's sleeve forms the fifth object.

FIG. 50 shows a control panel for the creation of three-dimensionalimages, including the association of layers and three-dimensionalobjects to masks within an image frame, specifically showing thecreation of a Plane layer for the sleeve of the person in the image. Onthe right side of the screendump, the “Rotate” button is enabled, showna “Translate Z” rotation quantity showing that the sleeve is rotatedforward as is shown in the next figure.

FIG. 51 shows a three-dimensional view of the various masks shown inFIGS. 43-49, wherein the mask associated with the sleeve of the personis shown as a Plane layer that is rotated toward the left and rightviewpoints on the right of the page. Also, as is shown the masksassociated with the jacket and person's face have been assigned aZ-dimension or depth that is in front of the background.

FIG. 52 shows a slightly rotated view of FIG. 51. This figure shows thePlane layer with the rotated sleeve tilted toward the viewpoints. Thecrystal ball is shown as a flat object, still in two-dimensions as ithas not yet been assigned a three-dimensional object type.

FIG. 53 shows a slightly rotated view of FIGS. 51 (and 52), wherein thesleeve is shown tilting forward, again without ever defining a wireframe model for the sleeve. Alternatively, a three-dimensional objecttype of column can be applied to the sleeve to make an even morerealistically three-dimensional shaped object. The Plane type is shownhere for brevity.

FIG. 54 shows a control panel specifically showing the creation of asphere object for the crystal ball in front of the person in the image.In this figure, the Sphere three-dimensional object is created anddropped into the three-dimensional image by clicking the “createselected” button in the middle of the frame, which is then shown (aftertranslation and resizing onto the crystal ball in the next figure).

FIG. 55 shows the application of the sphere object to the flat mask ofthe crystal ball, that is shown within the sphere and as projected tothe front and back of the sphere to show the depth assigned to thecrystal ball. The Sphere object can be translated, i.e., moved in threeaxis, and resized to fit the object that it is associated with. Theprojection of the crystal ball onto the sphere shows that the Sphereobject is slightly larger than the crystal ball, however this ensuresthat the full crystal ball pixels are assigned depths. The Sphere objectcan be resized to the actual size of the sphere as well for more refinedwork projects as desired.

FIG. 56 shows a top view of the three-dimensional representation of thefirst image frame showing the Z-dimension assigned to the crystal ballshows that the crystal ball is in front of the person in the scene.

FIG. 57 shows that the sleeve plane rotating in the X-axis to make thesleeve appear to be coming out of the image more. The circle with a line(X axis line) projecting through it defines the plane of rotation of thethree-dimensional object, here a plane associated with the sleeve mask.

FIG. 58 shows a control panel specifically showing the creation of aHead object for application to the person's face in the image, i.e., togive the person's face realistic depth without requiring a wire modelfor example. The Head object is created using the “Created Selected”button in the middle of the screen and is shown in the next figure.

FIG. 59 shows the Head object in the three-dimensional view, too largeand not aligned with the actual person's head. After creating the Headobject as per FIG. 58, the Head object shows up in the three-dimensionalview as a generic depth primitive that is applicable to heads ingeneral. This is due to the fact that depth information is not exactlyrequired for the human eye. Hence, in depth assignments, generic depthprimitives may be utilized in order to eliminate the need forthree-dimensional wire frames. The Head object is translated, rotatedand resized in subsequent figures as detailed below.

FIG. 60 shows the Head object in the three-dimensional view, resized tofit the person's face and aligned, e.g., translated to the position ofthe actual person's head.

FIG. 61 shows the Head object in the three-dimensional view, with theY-axis rotation shown by the circle and Y-axis originating from theperson's head thus allowing for the correct rotation of the Head objectto correspond to the orientation of the person's face.

FIG. 62 shows the Head object also rotated slightly clockwise, about theZ-axis to correspond to the person's slightly tilted head. The maskshows that the face does not have to be exactly lined up for the resultthree-dimensional image to be believable to the human eye. More exactingrotation and resizing can be utilized where desired.

FIG. 63 shows the propagation of the masks into the second and finalimage frame. All of the methods previously disclosed above for movingmasks and reshaping them are applied not only to colorization but todepth enhancement as well. Once the masks are propagated into anotherframe, all frames between the two frames may thus be tweened. Bytweening the frames, the depth information (and color information if nota color movie) are thus applied to non-key frames.

FIG. 64 shows the original position of the mask corresponding to theperson's hand.

FIG. 65 shows the reshaping of the mask, that is performed automaticallyand with can be adjusted in key frames manually if desired, wherein anyintermediate frames get the tweened depth information between the firstimage frame masks and the second image frame masks. The automatictracking of masks and reshaping of the masks allows for great savings inlabor. Allowing manual refinement of the masks allows for precision workwhere desired.

FIG. 66 shows the missing information for the left viewpoint ashighlighted in color on the left side of the masked objects in the lowerimage when the foreground object, here a crystal ball is translated tothe right. In generating the left viewpoint of the three-dimensionalimage, the highlighted data must be generated to fill the missinginformation from that viewpoint.

FIG. 67 shows the missing information for the right viewpoint ashighlighted in color on the right side of the masked objects in thelower image when the foreground object, here a crystal ball istranslated to the left. In generating the right viewpoint of thethree-dimensional image, the highlighted data must be generated to fillthe missing information from that viewpoint. Alternatively, a singlecamera viewpoint may be offset from the viewpoint of the originalcamera, however the missing data is large for the new viewpoint. Thismay be utilized if there are a large number of frames and some of themissing information is found in adjacent frames for example.

FIG. 68 shows an anaglyph of the final depth enhanced first image frameviewable with Red/Blue 3-D glasses. The original two-dimensional imageis now shown in three-dimensions.

FIG. 69 shows an anaglyph of the final depth enhanced second and lastimage frame viewable with Red/Blue 3-D glasses, note rotation ofperson's head, movement of person's hand and movement of crystal ball.The original two-dimensional image is now shown in three-dimensions asthe masks have been moved/reshaped using the mask tracking/reshaping asdescribed above and applying depth information to the masks in thissubsequent frame from an image sequence. As described above, theoperations for applying the depth parameter to a subsequent frame isperformed using a general purpose computer having a central processingunit (CPU), memory, bus situated between the CPU and memory for examplespecifically programmed to do so wherein figures herein which showcomputer screen displays are meant to represent such a computer.

FIG. 70 shows the right side of the crystal ball with fill mode “smear”,wherein the pixels with missing information for the left viewpoint,i.e., on the right side of the crystal ball are taken from the rightedge of the missing image pixels and “smeared” horizontally to cover themissing information. Any other method for introducing data into hiddenareas is in keeping with the spirit of the invention. Stretching orsmearing pixels where missing information is creates artifacts that arerecognizable to human observers as errors. By obtaining or otherwisecreating realistic data for the missing information is, i.e., forexample via a generated background with missing information filled in,methods of filling missing data can be avoided and artifacts are thuseliminated. For example, providing a composite background or frame withall missing information designated in a way that an artist can use tocreate a plausible drawing or painting of a missing area is one methodof obtaining missing information for use in two-dimensional tothree-dimensional conversion projects.

FIG. 71 shows a mask or alpha plane for a given frame of a scene, for anactor's upper torso and head 7101, and transparent wings 7102. The maskmay include opaque areas shown as black and transparent areas that areshown as grey areas. The alpha plane may be generated for example as an8 bit grey-scale “OR” of all foreground masks. Any other method ofgenerating a foreground mask having motion objects or foreground objectrelated masks defined is in keeping with the spirit of the invention.

FIG. 72 shows an occluded area, i.e., missing background image data 7201as a colored sub-area of the actor of FIG. 71 that never uncovers theunderlying background, i.e., where missing information in the backgroundfor a scene or frame occurs. This area is the area of the backgroundthat is never exposed in any frame in a scene and hence cannot beborrowed from another frame. When for example generating a compositebackground, any background pixel not covered by a motion object mask orforeground mask can have a simple Boolean TRUE value, all other pixelsare thus the occluded pixels as is also shown in FIG. 34.

FIG. 73 shows the occluded area of FIG. 72 with generated data 7201 afor missing background image data that is artistically drawn orotherwise rendered to generate a complete and realistic background foruse in artifact free two-dimensional to three-dimensional conversion.See also FIG. 34 and the description thereof. As shown, FIG. 73 also hasmasks drawn on background objects, which are shown in colors that differfrom the source image. This allows for colorization or colorizationmodifications for example as desired.

FIG. 73A shows the occluded area with missing background image data 7201b partially drawn or otherwise rendered to generate just enough of arealistic looking background for use in artifact free two-dimensional tothree-dimensional conversion. An artist in this example may drawnarrower versions of the occluded areas, so that offsets to foregroundobjects would have enough realistic background to work with whenprojecting a second view, i.e., translating a foreground objecthorizontally which exposes occluded areas. In other words, the edges ofthe missing background image data area may be drawn horizontally inwardby enough to allow for some of the generated data to be used, or all ofthe generated data to be used in generating a second viewpoint for athree-dimensional image set.

In one or more embodiments of the invention, a number of scenes from amovie may be generated for example by computer drawing by artists orsent to artists for completion of backgrounds. In one or moreembodiments, a website may be created for artists to bid on backgroundcompletion projects wherein the website is hosted on a computer systemconnected for example to the Internet. Any other method for obtainingbackgrounds with enough information to render a two-dimensional frameinto a three-dimensional pair of viewpoints is in keeping with thespirit of the invention, including rendering a full background withrealistic data for all of the occluded area of FIG. 72 (which is shownin FIG. 73) or only a portion of the edges of the occluded area of FIG.72, (which is shown as FIG. 73A). By estimating a background depth and adepth to a foreground object and knowing the offset distance desired fortwo viewpoints, it is thus possible to obtain less than the wholeoccluded area for use in artifact free two-dimensional tothree-dimensional conversion. In one or more embodiments, a fixedoffset, e.g., 100 pixels on each edge of each occluded area, or apercentage of the size of the foreground object, i.e., 5% for example,may flagged to be created and if more data is needed, then the frame isflagged for updating, or smearing or pixel stretching may be utilized tominimize the artifacts of missing data.

FIG. 74 shows a light area of the shoulder portion on the right side ofFIG. 71, where missing background image data 7201 exists when generatinga right viewpoint for a right image of a three-dimensional image pair.Missing background image data 7201 represents a gap where stretching (asis also shown in FIG. 70) or other artifact producing techniques wouldbe used when shifting the foreground object to the left to create aright viewpoint. The dark portion of the figure is taken from thebackground where data is available in at least one frame of a scene.

FIG. 75 shows an example of the stretching of pixels, or “smearedpixels” 7201 c, corresponding to the light area in FIG. 74, i.e.,missing background image data 7201, wherein the pixels are createdwithout the use of a generated background, i.e., if no background datais available for an area that is occluded in all frames of a scene.

FIG. 76 shows a result of a right viewpoint without artifacts on theedge of the shoulder of the person through use of generated data 7201 a(or 7201 b) for missing background image data 7201 shown as foralways-occluded areas of a scene.

FIG. 77 shows an example of a computer-generated element, here robot7701, which is modeled in three-dimensional space and projected as atwo-dimensional image. The background is grey to signify invisibleareas. As is shown in the following figures, metadata such as alpha,mask, depth or any combination thereof is utilized to speed theconversion process from two-dimensional image to a pair oftwo-dimensional images for left and right eye for three-dimensionalviewing. Masking this character by hand, or even in a computer-aidedmanner by an operator is extremely time consuming since there areliterally hundreds if not thousands of sub-masks required to renderdepth (and/or color) correctly to this complex object.

FIG. 78 shows an original image separated into background 7801 andforeground elements 7802 and 7803, (mountain and sky in the backgroundand soldiers in the bottom left also see FIG. 79) along with theimported color and depth of the computer-generated element, i.e., robot7803 with depth automatically set via the imported depth metadata.Although the soldiers exist in the original image, their depths are setby an operator, and generally shapes or masks with varying depths areapplied at these depths with respect to the original objects to obtain apair of stereo images for left and right eye viewing. (See FIG. 79). Asshown in the background, any area that is covered for the scene such asoutline 7804 (of a soldier's head projected onto the background) can beartistically rendered for example to provide believable missing data, asis shown in FIG. 73 based on the missing data of FIG. 73A, which resultsin artifact free edges as shown in FIG. 76 for example. Importing datafor computer generated elements may include reading a file that hasdepth information on a pixel-by-pixel basis for computer-generatedelement 7701 and displaying that information in a perspective view on acomputer display as an imported element, e.g., robot 7803. This importprocess saves enormous amounts of operator time and makes conversion ofa two-dimensional movie into a three-dimensional movie economicallyviable. One or more embodiments of the invention store the masks andimported data in computer memory and/or computer disk drives for use byone or more computers in the conversion process.

FIG. 79 shows mask 7901 (forming a portion of the helmet of therightmost soldier) associated with the photograph of soldiers 7802 inthe foreground. Mask 7901 along with all other operated-defined masksshown in multiple artificial colors on the soldiers, to apply depth tothe various portions of the soldiers occurring in the original imagethat lie in depth in front of the computer-generated element, i.e.,robot 7803. The dashed lines horizontally extending from the mask areas7902 and 7903 show horizontal translation of the foreground objectstakes place and where imported metadata can be utilized to accuratelyauto-correct over-painting of depth or color on the masked objects whenmetadata exists for the other elements of a movie. For example, when analpha exists for the objects that occur in front of thecomputer-generated elements, the edges can be accurately determined. Onetype of file that can be utilized to obtain mask edge data is a filewith alpha file and/or mask data such as an RGBA file. (See FIG. 80). Inaddition, use of generated data for missing areas of the background atthese horizontally translated mask areas 7902 and 7903 enables artifactfree two-dimensional to three-dimensional conversion.

FIG. 80 shows an imported alpha layer 8001 shown as a dark blue overlay,which can also be utilized as a mask layer to limit the operatordefined, and potentially less accurate masks used for applying depth tothe edges of the three soldiers 7802 and designated as soldiers A, B andC. In addition, an optional computer-generated element, such as dust canbe inserted into the scene along the line annotated as “DUST”, toaugment the reality of the scene if desired. Any of the background,foreground or computer-generated elements can be utilized to fillportions of the final left and right image pairs as is required.

FIG. 81 shows the result of using the operator-defined masks withoutadjustment when overlaying a motion element such as the soldier on thecomputer-generated element such as the robot. Without the use ofmetadata associated with the original image objects, such as matte oralpha 8001, artifacts occur where operator-defined masks do not exactlyalign with the edges of the masked objects. In the topmost picture, thesoldier's lips show a light colored edge 8101 while the lower pictureshows an artifact free edge since the alpha of FIG. 80 is used to limitthe edges of any operator-defined masks. Through use of the alphametadata of FIG. 80 applied to the operated-defined mask edges of FIG.79, artifact free edges on the overlapping areas is thus enabled. As oneskilled in the art will appreciate, application of successively nearerelements combined with their alphas is used to layer all of the objectsat their various depths from back to front to create a final image pairfor left eye and right eye viewing.

Embodiments of the invention enable real-time editing of 3D imageswithout re-rendering for example to alter layers/colors/masks and/orremove artifacts and to minimize or eliminate iterative workflow pathsback through different workgroups by generating translation files thatcan be utilized as portable pixel-wise editing files. For example, amask group takes source images and creates masks for items, areas orhuman recognizable objects in each frame of a sequence of images thatmake up a movie. The depth augmentation group applies depths, and forexample shapes, to the masks created by the mask group. When renderingan image pair, left and right viewpoint images and left and righttranslation files may be generated by one or more embodiments of theinvention. The left and right viewpoint images allow 3D viewing of theoriginal 2D image. The translation files specify the pixel offsets foreach source pixel in the original 2D image, for example in the form ofUV or U maps. These files are generally related to an alpha mask foreach layer, for example a layer for an actress, a layer for a door, alayer for a background, etc. These translation files, or maps are passedfrom the depth augmentation group that renders 3D images, to the qualityassurance workgroup. This allows the quality assurance workgroup (orother workgroup such as the depth augmentation group) to performreal-time editing of 3D images without re-rendering for example to alterlayers/colors/masks and/or remove artifacts such as masking errorswithout delays associated with processing time/re-rendering and/oriterative workflow that requires such re-rendering or sending the masksback to the mask group for rework, wherein the mask group may be in athird world country with unskilled labor on the other side of the globe.In addition, when rendering the left and right images, i.e., 3D images,the Z depth of regions within the image, such as actors for example, mayalso be passed along with the alpha mask to the quality assurance group,who may then adjust depth as well without re-rendering with the originalrendering software. This may be performed for example with generatedmissing background data from any layer so as to allow “downstream”real-time editing without re-rendering or ray-tracing for example.Quality assurance may give feedback to the masking group or depthaugmentation group for individuals so that these individuals may beinstructed to produce work product as desired for the given project,without waiting for, or requiring the upstream groups to rework anythingfor the current project. This allows for feedback yet eliminatesiterative delays involved with sending work product back for rework andthe associated delay for waiting for the reworked work product.Elimination of iterations such as this provide a huge savings inwall-time, or end-to-end time that a conversion project takes, therebyincreasing profits and minimizing the workforce needed to implement theworkflow.

FIG. 82 shows a source image to be depth enhanced and provided alongwith left and right translation files (see FIGS. 85A-D and 86A-D forembodiments of translation files) and alpha masks (such as shown in FIG.79) to enable real-time editing of 3D images without re-rendering orray-tracing the entire image sequence in a scene (e.g., by downstreamworkgroups) for example to alter layers/colors/masks and/or removeand/or or adjust depths or otherwise change the 3D images withoutiterative workflow paths back to the original workgroups (as per FIG. 96versus FIG. 95).

FIG. 83 shows masks generated by the mask workgroup for the applicationof depth by the depth augmentation group, wherein the masks areassociated with objects, such as for example human recognizable objectsin the source image of FIG. 82. Generally, unskilled labor is utilizedto mask human recognizable objects in key frames within a scene orsequence of images. The unskilled labor is cheap and generally locatedoffshore. Hundreds of workers may be hired at low prices to perform thistedious work associated with masking Any existing colorization masks maybe utilized as a starting point for 3D masks, which may be combined toform a 3D mask outline that is broken into sub-masks that definediffering depths within a human recognizable object. Any other method ofobtaining masks for areas of an image are in keeping with the spirit ofthe invention.

FIG. 84 shows areas where depth is applied generally as darker fornearer objects and lighter for objects that are further away. This viewgives a quick overview of the relative depths of objects in a frame.

FIG. 85A shows a left UV map containing translations or offsets in thehorizontal direction for each source pixel. When rendering a scene withdepths applied, translation maps that map the offsets of horizontalmovement of individual pixels in a graphical manner may be utilized.FIG. 85B shows a right UV map containing translations or offsets in thehorizontal direction for each source pixel. Since each of these imageslooks the same, it is easier to observe that there are subtledifferences in the two files by shifting the black value of the color,so as to highlight the differences in a particular area of FIGS. 85A and85B. FIG. 85C shows a black value shifted portion of the left UV map ofFIG. 85A to show the subtle contents therein. This area corresponds tothe tree branches shown in the upper right corner of FIGS. 82, 83 and 84just above the cement mixer truck and to the left of the light pole.FIG. 85D shows a black value shifted portion of the right UV map of FIG.85B to show the subtle contents therein. The branches shown in theslight variances of color signify that those pixels would be shifted tothe corresponding location in a pure UV map that maps Red from darkestto lightest in the horizontal direction and maps Green from darkest tolightest in the vertical direction. In other words, the translation mapin the UV embodiment is a graphical depiction of the shifting thatoccurs when generating a left and right viewpoint with respect to theoriginal source image. UV maps may be utilized, however, any other filetype that contains horizontal offsets from a source image on apixel-by-pixel basis (or finer grained) may be utilized, includingcompressed formats that are not readily viewable as images. Somesoftware packages for editing come with pre-built UV widgets, and hence,UV translation files or maps can therefore be utilized if desired. Forexample, certain compositing programs have pre-built objects that enableUV maps to be readily utilized and otherwise manipulated graphically andhence for these implementations, graphically viewable files may beutilized, but are not required.

Since creation of a left and right viewpoint from a 2D image useshorizontal shifts, it is possible to use a single color for thetranslation file. For example, since each row of the translation file isalready indexed in a vertical direction based on the location in memory,it is possible to simply use one increasing color, for example Red inthe horizontal direction to signify an original location of a pixel.Hence, any shift of pixels in the translation map are shown as shifts ofa given pixel value from one horizontal offset to another, which makesfor subtle color changes when the shifts are small, for example in thebackground. FIG. 86A shows a left U map containing translations oroffsets in the horizontal direction for each source pixel. FIG. 86Bshows a right U map containing translations or offsets in the horizontaldirection for each source pixel. FIG. 86C shows a black value shiftedportion of the left U map of FIG. 86A to show the subtle contentstherein. FIG. 86D shows a black value shifted portion of the right U mapof FIG. 86B to show the subtle contents therein. Again there is norequirement that a humanly viewable file format be utilized, and anyformat that stores horizontal offsets on a pixel-by-pixel basis relativeto a source image may be utilized. Since memory and storage is so cheap,any format whether compressed or not may be utilized without anysignificant increase in cost however. Generally, creation of a right eyeimage makes foreground portions of the U map (or UV map) appear darkersince they are shifting left and visa versa. This is easy to observe bylooking at something in the foreground with only the right eye open andthen moving slightly to the right (to observe that the foreground objecthas indeed been shifted to the left). Since the U map (or UV map) in theunaltered state is a simple ramp of color from dark to light, it thenfollows that shifting something to the left, i.e., for the rightviewpoint, maps it to a darker area of the U map (or UV map). Hence thesame tree branches in the same area of each U map (or UV map) are darkerfor the right eye and brighter for the left eye with respect toun-shifted pixels. Again, use of a viewable map is not required, butshows the concept of shifting that occurs for a given viewpoint.

FIG. 87 shows known uses for UV maps, wherein a three-dimensional modelis unfolded so that an image in UV space can be painted onto the 3Dmodel using the UV map. This figure shows how UV maps have traditionallybeen utilized to apply a texture map to a 3D shape. For example, thetexture, here a painting or flat set of captured images of the Earth ismapped to a U and V coordinate system, that is translated to an X, Y andZ coordinate on the 3D model. Traditional animation has been performedin this manner in that wire frame models are unraveled and flattened,which defines the U and V coordinate system in which to apply a texturemap.

Embodiments of the invention described herein utilize UV and U maps in anew manner in that a pair of maps are utilized to define the horizontaloffsets for two images (left and right) that each source pixel istranslated to as opposed to a single map that is utilized to define acoordinate onto which a texture map is placed on a 3D model or wireframe. I.e., embodiments of the invention utilize UV and U maps (or anyother horizontal translation file format) to allow for adjustments tothe offset objects without re-rendering the entire scene. Again, asopposed to the known use of a UV map, for example that maps twoorthogonal coordinates to a three-dimensional object, embodiments of theinvention enabled herein utilize two maps, i.e., one for a left and onefor a right eye, that map horizontal translations for the left and rightviewpoints. In other words, since pixels translate only in thehorizontal direction (for left and right eyes), embodiments of theinvention map within one-dimension on a horizontal line-by-line basis.I.e., the known art maps 2 dimensions to 3 dimensions, while embodimentsof the invention utilize 2 maps of translations within 1 dimension(hence visible embodiments of the translation map can utilize onecolor). For example, if one line of a translation file contains 0, 1, 2,3 . . . 1918, 1919, and the 2^(nd) and 3^(rd) pixels are translatedright by 4 pixels, then the line of the file would read 0, 4, 5, 3 . . .1918, 1919. Other formats showing relative offsets are not viewable asramped color areas, but may provide great compression levels, forexample a line of the file using relative offsets may read, 0, 0, 0, 0 .. . 0, 0, while a right shift of 4 pixels in the 2^(nd) and 3^(rd)pixels would make the file read 0, 4, 4, 0, . . . 0, 0. This type offile can be compressed to a great extent if there are large portions ofbackground that have zero horizontal offsets in both the right and leftviewpoints. However, this file could be viewed as a standard U file isit was ramped, i.e., made absolute as opposed to relative to view as acolor-coded translation file. Any other format capable of storingoffsets for horizontal shifts for left and right viewpoints may beutilized in embodiments of the invention. UV files similarly have a rampfunction in the Y or vertical axis as well, the values in such a filewould be (0,0), (0,1), (0,2) . . . (0, 1918), (0,1919) corresponding toeach pixel, for example for the bottom row of the image and (1,0),(1,1), etc., for the second horizontal line, or row for example. Thistype of offset file allows for movement of pixels in non-horizontalrows, however embodiments of the invention simply shift datahorizontally for left and right viewpoints, and so do not need the tokeep track of which vertical row a source pixel moves to sincehorizontal movement is by definition within the same row.

FIG. 88 shows a disparity map showing the areas where the differencebetween the left and right translation maps is the largest. This showsthat objects closest to the viewer have pixels that are shifted the mostbetween the two UV (or U) maps shown in FIG. 85A-B (or 86A-B).

FIG. 89 shows a left eye rendering of the source image of FIG. 82. FIG.90 shows a right eye rendering of the source image of FIG. 82. FIG. 91shows an anaglyph of the images of FIG. 89 and FIG. 90 for use withRed/Blue glasses.

FIG. 92 shows an image that has been masked and is in the process ofdepth enhancement for the various layers, including the actress layer,door layer, background layer (showing missing background informationthat may be filled in through generation of missing information—seeFIGS. 34, 73 and 76 for example). I.e., the empty portion of thebackground behind the actress in FIG. 92 can be filled with generatedimage data, (see the outline of the actress's head on the backgroundwall). Through utilization of generated image data for each layer, acompositing program for example may be utilized as opposed tore-rendering or ray-tracing all images in a scene for real-time editing.For example, if the hair mask of the actress in FIG. 92 is altered tomore correctly cover the hair, then any pixels uncovered by the new maskthat are obtained from the background and are nearly instantaneousavailable to view (as opposed to standard re-rendering or ray-tracingthat can take hours of processing power to re-render all of the imagesin a scene when anything in a scene is edited). This may includeobtaining generated data for any layer including the background for usein artifact free 3D image generation.

FIG. 93 shows a UV map overlaid onto an alpha mask associated with theactress shown in FIG. 92 which sets the translation offsets in theresulting left and right UV maps based on the depth settings of thevarious pixels in the alpha mask. This UV layer may be utilized withother UV layers to provide a quality assurance workgroup (or otherworkgroup) with the ability to real-time edit the 3D images, for exampleto correct artifacts, or correct masking errors without re-rendering anentire image. Iterative workflows however may require sending the frameback to a third-world country for rework of the masks, which are thensent back to a different workgroup for example in the United States tore-render the image, which then viewed again by the quality assuranceworkgroup. This type of iterative workflow is eliminated or minorartifacts altogether since the quality assurance workgroup can simplyreshape an alpha mask and regenerate the pixel offsets from the originalsource image to edit the 3D images in real-time and avoid involvingother workgroups for example. Setting the depth of the actress as perFIGS. 42-70 for example or in any other method determines the amount ofshift that the unaltered UV map undergoes to generate to UV maps, onefor left-eye and one for right-eye image manipulation as per FIGS.85A-D, (or U maps in FIGS. 86A-D). The maps may be supplied for eachlayer along with an alpha mask for example to any compositing program,wherein changes to a mask for example allows the compositing program tosimply obtain pixels from other layers to “add up” an image inreal-time. This may include using generated image data for any layer (orgap fill data if no generated data exists for a deeper layer). Oneskilled in the art will appreciate that a set of layers with masks arecombined in a compositing program to form an output image by arbitratingor otherwise determining which layers and corresponding images to lay ontop of one another to form an output image. Any method of combining asource image pixel to form an output pixel using a pair of horizontaltranslation maps without re-rendering or ray-tracing again after addingdepth is in keeping with the spirit of the invention.

FIG. 94 shows a workspace generated for a second depth enhancementprogram, based on the various layers shown in FIG. 92, i.e., left andright UV translation maps for each of the alphas wherein the workspaceallows for quality assurance personnel (or other workgroups) to adjustmasks and hence alter the 3D image pair (or anaglyph) in real-timewithout re-rendering or ray-tracking and/or without iteratively sendingfixes to any other workgroup. One or more embodiments of the inventionmay loop through a source file for the number of layers and createscript that generates the workspace as shown in FIG. 94. For example,once the mask workgroup has created the masks for the various layers andgenerated mask files, the rendering group may read in the mask filesprogrammatically and generate script code that includes generation of asource icon, alpha copy icons for each layer, left and right UV maps foreach layer based on the rendering groups rendered output, and othericons to combine the various layers into left and right viewpointimages. This allows the quality assurance workgroup to utilize toolsthat they are familiar with and which may be faster and less complexthan the rendering tools utilized by the rendering workgroup. Any methodfor generation of a graphical user interface for a worker to enablereal-time editing of 3D images including a method to create a sourceicon for each frame, that connects to an alpha mask icon for each layerand generates translation maps for left and right viewpoints thatconnect to one another and loops for each layer until combining with anoutput viewpoint for 3D viewing is in keeping with the spirit of theinvention. Alternatively, any other method that enables real-timeediting of images without re-rendering through use of a pair oftranslation maps is in keeping with the spirit of the invention even ifthe translation maps are not viewable or not shown to the user.

FIG. 95 shows a workflow for iterative corrective workflow. A maskworkgroup generates masks for objects, such as for example, humanrecognizable objects or any other shapes in an image sequence at 9501.This may include generation of groups of sub-masks and the generation oflayers that define different depth regions. This step is generallyperformed by unskilled and/or low wage labor, generally in a countrywith very low labor costs. The masked objects are viewed by higherskilled employees, generally artists, who apply depth and/or color tothe masked regions in a scene at 9502. The artists are generally locatedin an industrialized country with higher labor costs. Another workgroup,generally a quality assurance group then views the resulting images at9503 and determines if there are any artifacts or errors that needfixing based on the requirements of the particular project. If so, themasks with errors or locations in the image where errors are found aresent back to the masking workgroup for rework, i.e., from 9504 to 9501.Once there are no more errors, the process completes at 9505. Even insmaller workgroups, errors may be corrected by re-reworking masks andre-rendering or otherwise ray-tracing all of the images in a scene whichcan take hours of processing time to make a simple change for example.Errors in depth judgment generally occur less often as the higherskilled laborers apply depths based on a higher skill level, and hencekickbacks to the rendering group occur less often in general, hence thisloop is not shown in the figure for brevity although this iterative pathmay occur. Masking “kickback” may take a great amount of time to workback through the system since the work product must be re-masked andthen re-rendered by other workgroups.

FIG. 96 shows an embodiment of the workflow enabled by one or moreembodiments of the system in that each workgroup can perform real-timeediting of 3D images without re-rendering for example to alterlayers/colors/masks and/or remove artifacts and otherwise correct workproduct from another workgroup without iterative delays associated withre-rendering/ray-tracing or sending work product back through theworkflow for corrections. The generation of masks occurs as in FIG. 95at 9501, depth is applied as occurs in FIG. 95 at 9502. In addition, therendering group generates translation maps that accompany the renderedimages to the quality assurance group at 9601. The quality assurancegroup views the work product at 9503 as in FIG. 95 and also checks forartifacts as in FIG. 95 at 9504. However, since the quality assuranceworkgroup (or other workgroup) has translation maps, and theaccompanying layers and alpha masks, they can edit 3D images inreal-time or otherwise locally correct images without re-rendering at9602, for example using commercially available compositing programs suchas NUKE® as one skilled in the art will appreciate. For instance as isshown in FIG. 94, the quality assurance workgroup can open a graphicsprogram that they are familiar with (as opposed to a complex renderingprogram used by the artists), and adjust an alpha mask for examplewherein the offsets in each left right translation map are reshaped asdesired by the quality assurance workgroup and the output images areformed layer by layer (using any generated missing backgroundinformation as per FIGS. 34, 73 and 76 and any computer generatedelement layers as per FIG. 79). As one skilled in the art willrecognize, generating two output images from furthest back layer toforeground layer can be done without ray-tracing, by only overlayingpixels from each layer onto the final output images nearlyinstantaneously. This effectively allows for local pixel-by-pixel imagemanipulation by the quality assurance workgroup instead of 3D modelingand ray-tracing, etc., as utilized by the rendering workgroup. This cansave multiple hours of processing time and/or delays associated withwaiting for other workers to re-render a sequence of images that make upa scene.

FIGS. 118-126 illustrate exemplary embodiments of an embodiment of theinvention that includes the grouping tool interface 11801. In one ormore embodiments of the invention, the multi-stage production pipelinesystem for the motion picture projects includes the motion pictureproject management system described above. According to at least oneembodiment, the multi-stage production pipeline system includes thecomputer 9702 and the database 9701 coupled with the computer 9702 aspreviously described in FIG. 97. In at least one embodiment, theplurality of images are associated with a motion picture and wherein thedatabase further includes metadata associated with at least one shot orassociated with regions within the plurality of images in the at leastone shot, or both. In at least one embodiment of the invention, themulti-stage production pipeline system includes a project table, suchthat the project table includes a project identifier and description ofa project related to the motion picture as previously described.

FIG. 118 shows the grouping tool interface 11801 that may includemetadata categories 11802 associated with shot characteristics orelements in the images or scenes, text field 11803 to add additionalmetadata to the one or more metadata categories 11802, sort options11804 for one or more search results and a refresh search tabs 11805.According to one or more embodiments of the invention, the metadatacategories 11802 for shot characteristics may include a range ofelements such as locale, subject, framing, depth complexity and cleanplate complexity. In one or more embodiments, the metadata associatedwith the at least one shot is associated with the metadata category11802, wherein the metadata category 11802 includes one or more of thedepth complexity associated with the shot and the clean plate complexityassociated with the shot. The grouping tool interface 11801 may presentuser interface elements, accept input of metadata and accept selectedshots associated with the metadata via the user interface elements. Byway of one or more embodiments, the metadata associated with the atleast one shot is associated with at least one metadata category 11802.In at least one embodiment of the invention, the grouping tool interfacemay accept at least one additional metadata category via text field11803 and additional metadata values associated the metadata category11802.

For example, if a certain character is to be tagged, then a metadatavalue is added to the “subject” metadata category wherein the subject'svalue may be the actor's name. Alternatively, if a chair exists inmultiple scenes, then the artist or manager may enter a value of “chair”and tag particular masks associated with the chair and associate themetadata value with the scenes where the chair appears. This enables anartist to work on the chair in non-linear time sequence, which producesextremely consistent results, shortens the time for applying colorand/or depth to the object and lowers cost for the project.

In one or more embodiments, as explained above, the metadata category11802 comprises one or more of a locale or location at which the shotwas obtained, a subject that appears in the shot wherein the subject isa person, place or thing, a shot framing associated with the shotwherein the shot framing includes one or more of a close up or CU, a midshot or MS, wide shot or WS, and an extreme wide shot or XWS. In atleast one embodiment, the grouping tool interface and the metadatacategories may transition shot framing including any combination of CU,MS, SW and XWS, such that a CU may transition to a MS, WS or XWS, and/ora MS may transition to a CU, WS or XWS, and/or a WS may transition to aCU, MS or XWS, and/or a XWS may transition to a CU, MS or WS.

FIG. 119 illustrates a view of the grouping tool interface with metadatatags and shot tables according to one or more embodiments of theinvention. According to FIG. 119, the grouping tool interface 11801presents a shot table image 11901, at least one tab 11902 that addsmetadata to pre-selected scenes or images within the scenes, at leastone tab 11903 that replaces pre-existing metadata associated with ascene or images within the scene, metadata tags 11904 and at least oneshot list 11905, for example as stored in the shot table in database9701. In at least one embodiment of the invention, the computer 9702 mayone or more of store the metadata 11904 associated with the selectedshots in the shot list 11905, accept selected metadata to search the atleast one shot, query the shot list 11905 with the selected metadataassociated with the at least one shot or regions within a plurality ofimages in the at least one shot or both, and display a list of shotshaving the selected metadata. By way of one or more embodiments, thedatabase 9701 includes the shot table 11905 with a shot identifierassociated with a plurality of images that are ordered in time and thatmake up a shot, such that the shot table 11905 has a starting framevalue and an ending frame value associated with each shot.

Once a user has searched for the shots to add/change metadata to, or tofind scenes to work on with a particular subject for example, the usermay select one or more of the metadata tags 11904 from the metadatacategory columns 11802. In one or more embodiments, the list of shotsmay include at least one shot that is non-sequential in time in themotion picture with respect to another shot in the list of shots, whichenables non-linear workflow to occur on the motion picture project. Thecomputer 9702 may assign work tasks based on the list of shots whereinthe list of shots includes the at least one shot that is non-sequentialin time in the motion picture with respect to another shot in the listof shots. Once the selected shots have the appropriate metadata tagged,the user may search for shots that share the same metadata tags in orderto group or “bucket” the respective shots into groups.

FIG. 120 illustrates a view of the grouping tool interface 11801 withshots that share metadata characteristics according to one or moreembodiments of the invention. For example, as shown in FIG. 120, a usermay search for all the sequential shots that contain the subject: “MAX”,locale: “SHIP_ERIC”, and framing: “CU”, as presented in the respectivemetadata category columns 11802. FIG. 120 illustrates the grouping toolinterface with group codes 12001, shots that share characteristics12002, for example shown as highlighted or otherwise selected, and a tab12003 that adds selected shots to a group to be assigned together.

FIG. 121 illustrates a view of the grouping tool interface with a keyselect shot and associated color code, the depth complexity andassociated color code and the clean plate complexity and associatedcolor code according to one or more embodiments. FIG. 121 presents keyselect element 12101, depth master or complexity element 12102, andclean plate complexity element 12103, all using color codes. Element12104 represents a key select and depth complexity master element with arespective color code. In at least one embodiment of the invention, thegrouping tool interface 11801 includes a reference mask library toolwith a plurality of reference masks, wherein each of the list of shotsshare selected metadata characteristics as described with reference toFIG. 120 above.

Once the selected shots have been grouped together, the reference masksmay be assigned and checked out to all shots in a group that wereassigned together as discussed with reference to FIG. 120 above. Forexample, selecting “Add Mask Lib Ref” opens an additional panel in thegrouping tool interface 11802 to allow a user to search for masks thatfit the assigned group a user is working on. For example, a user mayselect a reference that shares similar metadata to the selected shots,and right clicking on the metadata characteristics needed to use in theselected shots to add metadata to. In one or more embodiments, thereference mask library tool may present an interface to accept aselection of one or more of the plurality of reference masks to beutilize in shots in the list of shots that do not already utilize thereference mask associated with the subject, for example. Integration ofthe Mark Library Reference Tool within the grouping tool interface 11802allows a user to search for masks across multiple projects. This mayenable a worker to locate similar subjects in one or more additionalshots and use reference masks associated with the subject located to addmetadata to the one or more additional shots, without the need toreinvent metadata for each of those additional shots with the samesubject such as a person, place or thing. In one or more embodiments,each one of the plurality of reference masks may be a dedicated templateof the subject.

FIG. 121 also presents a drop down menu 12105 with different searchoptions to enable a worker to build, search and sort timelines, forexample to show the non-linear scenes in a movie viewing window orotherwise display images to work on, and a tab that refines searchcriteria 12106 to allow a user to search across multiple projects of asingle motion picture, or several motion pictures. For example, theplurality of reference masks may be obtained from a second projectassociated with a second motion picture that differs from the firstmotion picture. This allows the system to create different sequelsand/or motions pictures with similar subjects, such as people, places orthings, from the first motion picture, with similar metadata to enabletime efficiency, consistency and accuracy.

By way of one or more embodiments, the grouping tool interface maydisplay a plurality of search results including the lists of shotsassociated with a plurality of respective selected metadata. Inaddition, in at least one embodiment of the invention, the grouping toolinterface may present a timeline of the plurality of images associatedwith the list of shots. For example, two different search results may bepresented to enable a user to see differences in shots or lists of shotsassociated with different sets of metadata, and add any necessarymetadata if missing metadata is found.

In one or more embodiments, the may computer may one or more of presenta first display to be viewed by a production worker that includes asearch display with one or more of a context, a project, a shot, thelist of shots, a status and an artist, present a second display to beviewed by an artist that includes at least one daily assignment having acontext, project and shot or the list of shots or both, and present athird display to be viewed by an editorial worker that includes anannotation frame to accept commentary or drawing or both commentary anddrawing on at least one of the plurality of images associated with theat least one shot or the list of shots or both. In one or moreembodiments of the invention, the at least one shot or the list ofshots, or both, include status related to progress of work performed.

FIG. 122 illustrates another view of the grouping tool interfaceaccording to one or more embodiments on the invention. As shown in FIG.122, by way one or more embodiments, the grouping tool interface 11802may accept an input to designate the at least one shot as a master shotassociated with depth, key selects or clean plate or any combinationthereof. This enables the at least one shot to be utilized as abenchmark for quality or volume or to improve efficiency or both. Forexample, right clicking on shots in the search results window allows auser to set shots to be Key Selects, Depth Masters, Key Selects/DepthMasters and Clean Plate. For example, Key Selects, Depth Masters and KeySelects/Depth Masters may be benchmarks for quality and volume for ashot, and Clean Plate may set one shot as the reference, improvingefficiency when other shots are clean plated within the assigned group.

FIG. 123 illustrates a close-up view of the grouping tool interfaceaccording to one or more embodiments, with metadata category columns11802, text field 11803 to add additional metadata take to the one ormore metadata categories 11802 and metadata tags 11904, such as “NORA”under the subject category, and “WS” under the framing category.

FIG. 124 illustrates a close-up view of a shot of the grouping toolinterface according to one or more embodiments, showing a search resultscolumns. Selected items in the search column may also be loaded into aFrame Cycler timeline by clicking a “Build Timeline” drop down menu (notshown) also part of the grouping tool interface 11801. Built timelinesmay be referenced as base timelines within a Review Tool of the groupingtool interface 11801, and may be populated with the latest metadata tomaintain quality, style, technique and efficiency when working within asequential list of shots, non-sequential list of shots, over an entiremotion picture, or several motion pictures if similar characteristicsare shared.

As also discussed with reference to FIG. 99, a user may select an OpenMSI (Master Shot Index) Option that will open an MSI tool for any shotor list of shots the user has selected from the search results column.In doing so, a user is able to enter notes, look at metadata taggedshots, look at original shots and check data integrity associated witheach shot within the grouping tool interface 11801. This allows for anintegrated view between data describing one or more shots and the entireshot metadata tagging process taking place.

FIG. 125 illustrates a close-view of a plurality of shot to add metadatato according to one or more embodiments and in addition shows a popup ofother metadata associated with a scene, including for example the numberof frames in the shot, the reel on which the shot occurs, the locale ofthe shot, here “LIVINGROOM”, the subject(s) that appear in the scene,here “FREDERICK” and the type of framing, here “MS” or mid-shot.

FIG. 126 illustrates an overall view of the grouping interface with ashot from a plurality of images with metadata.

Embodiments enable a large studio workforce to work non-linearly on afilm while maintaining a unified vision driven by key creative figures.Thus, work product is more consistent, higher quality, faster, lessexpensive and enables reuse of project files, masks and other productionelement across projects since work is no longer constrained by shotorder when using embodiments of the invention.

While the invention herein disclosed has been described by means ofspecific embodiments and applications thereof, numerous modificationsand variations could be made thereto by those skilled in the art withoutdeparting from the scope of the invention set forth in the claims.

What is claimed is:
 1. A multi-stage production pipeline system formotion picture projects comprising: a computer; a database coupled withsaid computer, wherein said database comprises a shot table thatcomprises a shot identifier associated with a plurality of images thatare ordered in time and that each make up at least one shot wherein saidshot table comprises a starting frame value and an ending frame valueassociated with each of said shots; an asset table that comprisesinformation on one or more assets used in production of said motionpicture; wherein said plurality of images are associated with a motionpicture, and said database further includes metadata associated withsaid at least one shot or associated with regions within said pluralityof images in said at least one shot or both; wherein said computer isconfigured to calculate an amount of disk space that may be utilized toarchive said one or more assets and signify at least one asset of saidone or more assets that may be rebuilt from other assets to avoidarchival of said at least one asset.
 2. The system of claim 1 whereinsaid computer is further configured to calculate an amount of disk spacethat may be saved by said avoid archival of said at least one asset. 3.The system of claim 1 wherein said information on one or more assetscomprises an indicator of whether an asset can be rebuilt from saidother assets.
 4. The system of claim 1 wherein said information on oneor more assets comprises a dependency graph that indicates which assetsdepend on which other assets.
 5. The system of claim 1 wherein saidinformation on one or more assets comprises a compression value thatindicates the extent to which an asset can be compressed.
 6. The systemof claim 1 wherein said computer is further configured to calculate orestimate an amount of disk space that may be saved by compression of oneor more of said one or more assets.
 7. The system of claim 1 whereinsaid computer is further configured to present a grouping tool interfacecoupled with said computer and said database; said grouping toolinterface is configured to present user interface elements, accept inputof said metadata and accept selected shots associated with said metadatavia said user interface elements; store said metadata associated withsaid selected shots in said shot table; accept selected metadata tosearch said at least one shot; query said shot table with said selectedmetadata associated with said at least one shot or said regions withinsaid plurality of images in said at least one shot or said both; and,display a list of shots having said selected metadata, wherein said listof shots includes at least one shot that is non-sequential in time insaid motion picture with respect to another shot in said list of shots.8. The system of claim 7, wherein said metadata associated with said atleast one shot is associated with a metadata category comprising alocale or location at which said shot was obtained.
 9. The system ofclaim 7, wherein said metadata associated with said at least one shot isassociated with a metadata category comprising a subject that appears insaid shot wherein said subject is a person, place or thing.
 10. Thesystem of claim 7, wherein said metadata associated with said at leastone shot is associated with a metadata category comprising a shotframing associated with said at least one shot.
 11. The system of claim7, wherein said metadata associated with said at least one shot isassociated with a metadata category comprising a depth complexity or aclean plate complexity associated with said shot.
 12. The system ofclaim 7, wherein said grouping tool interface is configured to accept atleast one additional metadata category and additional metadata valuesassociated with said metadata category.
 13. The system of claim 7,wherein said grouping tool interface is configured to accept an input todesignate said at least one shot as a master shot associated with depth,key selects or clean plate or any combination thereof that enables saidat least one shot to be utilized as a benchmark for quality or volume,or to improve efficiency, or both.
 14. The system of claim 9, whereinsaid grouping tool interface comprises a reference mask library toolwith a plurality of reference masks, wherein each of said list of shotsshare said selected metadata, and wherein said reference mask librarytool is configured to present an interface to accept a selection of oneor more of said plurality of reference masks to be utilized in shots insaid list of shots that do not already utilize said reference maskassociated with said subject.
 15. The system of claim 14, wherein eachone of said plurality of reference masks is configured as a dedicatedtemplate of said subject.
 16. The system of claim 14, wherein at leastone of said plurality of reference masks is configured to be obtainedfrom a second motion picture that differs from said motion picture. 17.The system of claim 7, wherein said grouping tool interface is furtherconfigured to present a timeline of said plurality of images associatedwith said list of shots.
 18. The system of claim 7, wherein saidcomputer is further configured to assign work tasks based on said listof shots.
 19. The system of claim 1, wherein said shot table in saiddatabase further comprises status related to progress of work performedon said at least one shot.
 20. The system of claim 1, wherein saiddatabase further comprises a task table which includes at least one taskwhich comprises a task identifier and an assigned worker and whichfurther comprises a context setting associated with a type of taskrelated to motion picture work wherein said task includes at leastdefinition of a region within said plurality of images, work on saidregion and composite work on said region and wherein said at least onetask comprises a time allocated to complete said at least one task. 21.The system of claim 20, wherein said database further comprises aproject table, wherein said project table comprises a project identifierand description of a project related to said motion picture.
 22. Thesystem of claim 21, wherein said database further comprises a timesheetitem table which references said project identifier in said projecttable and said task identifier in said task table and which includes atleast one timesheet item which comprises a start time and an end time;23. The system of claim 1, wherein said database further comprises anasset request table which comprises an asset request identifier and shotidentifier.
 24. The system of claim 1, wherein said database furthercomprises a mask request table which comprises a mask request identifierand shot identifier.
 25. The system of claim 21, wherein said databasefurther comprises a note table which comprises a note identifier andwhich references said project identifier and which comprises at leastone note related to at least one of said plurality of images from saidmotion picture.
 26. The system of claim 1, wherein said database furthercomprises a snapshot table which comprises a snapshot identifier andsearch type and which includes a snapshot of said at least one shot thatincludes at least one location of a resource associated with said atleast one shot.
 27. The system of claim 1, wherein said computer isfurther configured to present a first display configured to be viewed bya production worker that includes a search display comprising a context,project, shot, status and artist; present a second display configured tobe viewed by an artist that includes at least one daily assignmenthaving a context, project and shot; and, present a third displayconfigured to be viewed by an editorial worker that includes anannotation frame configured to accept commentary or drawing or bothcommentary and drawing on at least one of said plurality of imagesassociated with said at least one shot.
 28. The system of claim 27,wherein said computer is further configured to present an annotationoverlaid on at least one of said plurality of images on said thirddisplay configured to be viewed by said editorial worker.
 29. The systemof claim 22, wherein said computer is further configured to calculateactuals based on total time spent for all of said at least one tasksassociated with all of said at least one shot in said project.
 30. Thesystem of claim 29, wherein said computer is further configured tocompare said actuals to time allocated for all of said at least onetasks associated with all of said at least one shot in said project;based on said compare said actuals to time allocated for all of said atleast one tasks, estimate one or more of remaining cost for saidproject; time of completion of said project.